This page will summarize important changes to our methodology and data sources that we expect to significantly affect the unpaywall dataset. 

2019-12-08: added articles from Semantic Scholar:

We're adding about 8 million PDFs hosted by Semantic Scholar. We already have OA locations for many of these articles, but we expect this to create 3 million new Green OA articles by the end of 2019.

2019-11-14: improved PDF validation:

Our automated PDF validation processes are now much more robust, allowing us to add about 1.5 million new OA articles. Half of these are in newly-identified Gold OA journals that we were previously unable to spot because these articles looked unavailable to us.