This page will summarize important changes to our methodology and data sources that we expect to significantly affect the unpaywall dataset. 




2019-12-08: added articles from Semantic Scholar:


We're adding about 8 million PDFs hosted by Semantic Scholar. We already have OA locations for many of these articles, but we expect this to create 3 million new Green OA articles by the end of 2019.



2019-11-14: improved PDF validation:


Our automated PDF validation processes are now much more robust, allowing us to add about 1.5 million new OA articles. Half of these are in newly-identified Gold OA journals that we were previously unable to spot because these articles looked unavailable to us.