# Data format

{% hint style="danger" %}
This data schema is in beta and can change with little notice.
{% endhint %}

OA.Report collects data on scholarly works such as papers, preprints, datasets, and code. Some of that data is generated by OA.Works and some are reused from other open sources.

For an example of a record with most of the possible fields, go to [https://api.oa.works/report/works/10.1016/j.ijid.2020.05.122](https://beta.oa.works/report/works/10.1016/j.ijid.2020.05.122)

## Generated data

#### `can_archive`

*Boolean:* `True` if the work can be self-archived in a repository.

*Source*: [ShareYourPaper Permissions](https://shareyourpaper.org/permissions)\
\&#xNAN;*Updated*: Daily (premium), occasionally (free)

#### `version`

*String:* What version of the work can be self-archived in a repository?

`can_archive: "acceptedVersion"`

Values are based on the [DRIVER Guidelines versioning scheme.](https://wiki.surfnet.nl/display/DRIVERguidelines/DRIVER-VERSION+Mappings)

*Source*: [ShareYourPaper Permissions](https://shareyourpaper.org/permissions)\
\&#xNAN;*Updated*: Daily (premium), occasionally (free)

#### `journal_oa_type`

*String:* The journal's OA type.

{% hint style="info" %}
Think of this as [`oa_status`](https://unpaywall.org/data-format)for a journal.
{% endhint %}

Values include:

* `gold`: The journal's whole output is published Open Access.
* `hybrid`: The journal allows some articles to be published Open Access.
* `transformative`: The journal allows some articles to be published Open Access, and is listed by Coalition S as a transformative journal.
* `diamond` : The journal whole output is published Open Access. with no APC
* `closed`: The journal's output is entirely behind a paywall, or bronze.
* `not applicable:` Used when the work is not in a journal (typically, a pre-print)

`journal_oa_type: "diamond"`

*Source*: OA.Works\
\&#xNAN;*Updated*: Daily (premium), occasionally (free)

#### `pmc_has_data_availability_statement`

*Boolean:* `true` if PMC reports the article as having a data availability statement.

*Source*: PMC\
\&#xNAN;*Updated*: Weekly (premium), occasionally (free)

#### `has_data_availability_statement`

*Boolean:* `true` if the article has a data availability (or, resource availability) statement.

*Source*: PMC\
\&#xNAN;*Updated*: Weekly (premium), occasionally (free)

#### `data_availability_statement`

*String*: The fulltext of the data availability statement\
\&#xNAN;*Source*: OA.Works\
\&#xNAN;*Updated*: Weekly (premium), occasionally (free)

#### `data_availability_doi`

Arra&#x79;*:* Any DOIs found in `data_availability_statement`.\
\&#xNAN;*Source*: OA.Works\
\&#xNAN;*Updated*: Weekly (premium), occasionally (free)

#### `data_availability_url`

Arra&#x79;*:* Any URLs found in `data_availability_statement`.\
\&#xNAN;*Source*: OA.Works\
\&#xNAN;*Updated*: Weekly (premium), occasionally (free)

#### `email`

*String:* The corresponding author's email address

`email: "example@place.edu"`

{% hint style="info" %}
Most emails are encrypted if you're not logged in and viewing emails associated with your organization.
{% endhint %}

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `author_email_name`

*String:* The corresponding author's name for use in emails

`email: "Dr.Who"`

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `crossref_is_oa`

*Boolean:* `true` if crossref data suggests the article is free to read

*Source:* Crossref\
\&#xNAN;*Updated:* Weekly (premium), occasionally (free)

#### `updated`

*String: timestamp of when the record was last updated*

`updated: "1675693406601"`

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium), occasionally (free)

### `Supplements`

Each of these keys is found in the `supplements` object.

{% hint style="info" %}
While the name suggests these are secondary, they're in fact, critical to OA.Report. They were given this name as they "supplement" the open data, and have nothing to do with supplemental information you might find in a research article.
{% endhint %}

#### `publisher_license_best`

*String:* The license applied to this work by the publisher as best we can determine.

`publisher_license_best: "cc-by"`

*Source*: Unpaywall, CrossRef, and manual collection can be used to support this designation.\
\&#xNAN;*Updated*: Weekly (premium), occasionally (free)

#### `repository_license_best`

*String:* The license applied to this work by the repository as best we can determine.

`publisher_license_best: "cc-by"`

*Source*: Data from Unpaywall and Europe PMC can be used to support this designation.\
\&#xNAN;*Updated*: Weekly (premium), occasionally (free)

#### `is_preprint`

*Boolean:* `true` if the article is on a preprint server

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `has_preprint_copy`

*Boolean:* `true` if the article has a version on a preprint server

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `preprint_doi`

*String:* The doi of the article's preprint

`preprint_doi: "10.21203/rs.3.rs-805463/v1"`

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `has_data_availability_statement`

*Boolean:* `true` if the work has a data or resource availability statement

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `has_made_data`

*Boolean:* `true` if the article uses data the authors made in the process of research

Source: Dataseer\
Updated: As requested (premium)

#### `has_shared_data`

*Boolean:* `true` if the article shared the data in some location (e.g in the supplements, the article itself, a data repository, their website)

Source: Dataseer\
Updated: As requested (premium)

#### `has_open_data`

*Boolean:* `true` if the authors shared their data and licensed it cc-by or cc-0.

Source: OA.Works.\
Updated: As requested (premium)

#### `has_reused_data`

*Boolean:* `true` if the work relies on data not created by the authors

Source: Dataseer\
Updated: As requested (premium)

#### `has_made_code`

*Boolean:* `true` if the article uses code the authors made in the process of research

Source: Dataseer\
Updated: As requested (premium)

#### `has_shared_code`

*Boolean:* `true` if the article shared the code in some location (e.g in the supplements, the article itself, a data repository, their website)

Source: Dataseer\
Updated: As requested (premium)

#### `has_open_code`

*Boolean:* `true` if the authors shared their data and licensed it under a permissive open source licence (e.g MIT)

Source: OA.Works.\
Updated: As requested (premium)

#### `resource_doi`

*String:* DOI(s) found associated with the work (could be for a dataset, codebase, or something else)

Source: OA.Works.\
Updated: As requested (premium)

#### `resource_licence`

*String:* licence found associated to supporting resources (could be for a dataset, codebase, or something else)

Source: OA.Works.\
Updated: As requested (premium)

#### `resource_location_name`

*String:* location(s) of supporting resource(s)

Source: OA.Works.\
Updated: As requested (premium)

#### `resource_location_url`

*String:* URL(s) found associated with the work (could be for a dataset, codebase, or something else)

Source: OA.Works.\
Updated: As requested (premium)

#### `apc_cost`

*String:* the APC cost in USD

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `invoice_date`

*String: Date an APC invoice was issued*

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `invoice_year`

*String: Year an APC invoice was issued*

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `invoice_number`

*String: The invoice number provided on the invoice*

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `is_original_research`

*String:* Scholarly research articles, including only peer-reviewed research that present new findings. This excludes reviews, editorials, methods and conference proceedings.

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `has_epmc_fulltext`&#x20;

*Boolean:* `true` if the article has full text available in Europe PMC

*Source:* Europe PMC\
\&#xNAN;*Updated:* Daily (premium)

#### `epmc_licence`

*String:* The license applied to the full text in Europe PMC

*Source:* Europe PMC\
\&#xNAN;*Updated:* Daily (premium)

#### `submitted_date`

*String:* The date an a work was submitted to a journal

*Source:* OA.Works, PMC\
\&#xNAN;*Updated:* Weekly (premium)

### Organization specific `supplements`

These keys also start with `supplements.` However, they also end with an organization's name or acronym to provide organization-specific data. For instance: `supplements.grantid__bmgf`.

#### `grantid*`

*String:* The grant ID(s) associated with the work

*Source:* OA.Works, Crossref\
\&#xNAN;*Updated:* Weekly (premium)

#### `is_compliant*`

*Boolean:* `true` if the work is compliant with the organization's Open Access policy

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `is_covered_by_policy`

*Boolean:* `true` if the work is covered under the organization's Open Access policy

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `is_new*`

*Boolean:* `true` if the work has been added since the last time we sent the user a report

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `program*`

*String:* the grant program the work was supported by

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `is_approved_repository*`

*Boolean:* `true` if this work is deposited in an approved repository under the Open Access policy

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `financial_disclosures*`

*Boolean:* `true` if this work's funding statement is actually a financial disclosure

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `remove*`

*Boolean:* `true` if this work should be removed from an organization's results for any reason

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

#### `fundingstatement*`

*String:* full-text of funding statement

*Source:* OA.Works\
\&#xNAN;*Updated:* Weekly (premium)

## Reused data

Open sources do a fantastic job of providing a lot of the core metadata used by OA.Report.

See Crossref's documentation for the following keys:

* `funder`
  * `name`
  * `award`
  * `DOI`
* `subject`

{% hint style="info" %}
Note: Crossref is not our only source of funding data. However, it is the best source of open, structured data.
{% endhint %}

See [OpenAlex's documentation](https://docs.openalex.org/about-the-data/work#title) for the following keys:

* `doi`
* `title`
* `subtitle`
* `journal`
* `publisher`
* `issn`
* `volume`
* `issue`
* `PMCID`
* `is_retracted`
* `is_paratext`
* `is_oa`
* `published_date`
* `published_year`
* `type`
* `authorships`
  * `author`
    * `id`
    * `display_name`
    * `orcid`
    * `author_position`
  * `institutions`
    * `id`
    * `display_name`
    * `ror`
    * `raw_affiliation_string`
* `concepts`
  * `display_name`
  * `id`
  * `level`
  * `score`

{% hint style="info" %}
We use equivalent Crossref data where OpenAlex data isn't yet available to provide up-to-date results. In some cases, such as \`PMCID\` we use other sources to provide more complete coverage.
{% endhint %}

See [Unpaywall's documentation](https://unpaywall.org/data-format) for the following keys:

* `oadoi_is_oa` (see `` `is_oa` ``)
* `oa_status`
* `has_repository_copy`
* `has_oa_locations_embargoed`

In the below `host_type` is prepended to a key as a helpful simplification over Unpaywall's `oa_locations` data:

* `publisher_version`
* `publisher_license`
* `publisher_url_for_pdf`
* `repository_version`
* `repository_license`
* `repository_url`
* `repository_url_for_pdf`
* `repository_url_in_pmc`
* `best_oa_location_url`
* `best_oa_location_url_for_pdf`
