In each oa_location, the purpose of oa_date is to tell you when the full text of this version of the article was first available at this location. Here’s an example of an article published in an OA journal:
10.5713/ajas.18.0801 (published 2015-01-27) has these oa_locations, among others:
"evidence": "oa journal (via doaj)",
"evidence": "oa repository (via OAI-PMH doi match)",
"repository_institution": "PubMed Central - Europe PMC"
Since this article was published in an OA journal, it was available from the publisher immediately. Then a few months later, full text was also posted to Europe PMC.
If we’re not confident in our estimation of the the date we’ll say it’s null. How the oa_date is calculated depends on the type of oa_location, its oa_status, and in some cases metadata we have for individual repositories.
Gold: This one is easy - the article is free at the time of publication. oa_date = published_date.
Hybrid: Also easy, but not as obvious. By “Hybrid” we mean the article has been published with an OA license in an otherwise toll-access journal.
If the published version of the article is available immediately, oa_date = published_date.
If a submitted or accepted manuscript is available under a license separate from that of the published version, oa_date = manuscript license effective date.
Bronze: Although we may support oa_date for bronze in the future, currently the oa_date for bronze articles is always null. This is for a few reasons:
Bronze OA can come and go - if we record the first date we find the article there is no guarantee it was continuously available from that date until now.
We’re more likely to discover Bronze OA after a significant delay than other types, so we’re less confident in the date.
Bronze is rarely relevant to OA mandates.
Determining an oa_date for repository locations is challenging. In brief, we create repository locations by:
Querying an Institutional Repository for OAI-PMH records,
using URLs in those records to locate full-text copies of articles, and
matching the articles we find to published articles by DOI or by title and author.
Each OAI-PMH record has a timestamp that tells when it was last modified. We can use this to determine when the full text article was first posted, but there are two problems:
The record can be created well before full text is posted, so it could be earlier than the actual OA date.
The record can continue to be modified, updating its timestamp, long after full text is posted, so it could be later than the actual OA date.
So we can’t just take the record timestamp at face value. We have to record the timestamp when we first find the full text article, then preserve that as the first availability date.
We started recording these timestamps on 2020-08-07, so we only have reliable information for articles first posted on or after that date.
Articles posted 2020-08-07 or later: When we discover an article in a repository, we record the date portion of the OAI-PMH record timestamp. This date is frozen - it doesn’t change even if the record timestamp does. Internally, we call this date PmhVersionFirstAvailable. oa_date = PmhVersionFirstAvailable.
Articles posted before 2020-08-07: Because these timestamps could have been updated after the article was posted, there is too much uncertainty to use them. oa_date = null.