COVID-19 Open Research Dataset
Full-text and metadata dataset of COVID-19 and coronavirus-related scholarly articles optimized for machine readability and made available for use by the global research community.
In response to the COVID-19 pandemic, the Allen Institute for AI has partnered with leading research groups to prepare and distribute the COVID-19 Open Research Dataset (CORD-19), a free resource of over 47,000 scholarly articles, including over 36,000 with full text, about COVID-19 and the coronavirus family of viruses for use by the global research community.
This dataset is intended to mobilize researchers to apply recent advances in natural language processing to generate new insights in support of the fight against this infectious disease.
This dataset is made available by the the Allen Institute of AI and Semantic Scholar . By accessing, downloading, or otherwise using any content provided in the CORD-19 Dataset, you agree to the Dataset License related to the use this dataset. Specific licensing information for individual articles in the dataset is available in the metadata file. Additional licensing information is available on the PMC website, medRxiv website and bioRxiv website.
Volume and Retention
This dataset is stored in Json format and the latest release contains over 36,000 full text articles. Each paper is represented as a single JSON object. The schema is available here.
This dataset is stored in the East US Azure region. Allocating compute resources in East US is recommended for affinity.
When including CORD-19 data in a publication or redistribution, please cite the dataset as follows:
COVID-19 Open Research Dataset (CORD-19). 2020. Version YYYY-MM-DD. Retrieved from COVID-19 Open Research Dataset (CORD-19). Accessed YYYY-MM-DD. doi:10.5281/zenodo.3715505
In text: (CORD-19, 2020)
For questions about this dataset, contact firstname.lastname@example.org.
MICROSOFT PROVIDES AZURE OPEN DATASETS ON AN “AS IS” BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INCLUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.
|Available in||When to use|
Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.