This page will summarize important changes to our methodology and data sources that we expect to significantly affect the unpaywall dataset.
2019-12-08: added articles from Semantic Scholar:
We're adding about 8 million PDFs hosted by Semantic Scholar. We already have OA locations for many of these articles, but we expect this to create 3 million new Green OA articles by the end of 2019.
2019-11-14: improved PDF validation:
Our automated PDF validation processes are now much more robust, allowing us to add about 1.5 million new OA articles. Half of these are in newly-identified Gold OA journals that we were previously unable to spot because these articles looked unavailable to us.