Skip Navigation

Curated open data made easily accessible on Azure

TartanAir: AirSim Simulation Dataset for Simultaneous Localization and Mapping

TartanAir AirSim Autonomous vehicle data generated to solve Simultaneous Localization and Mapping (SLAM).

Microsoft News Recommendation Dataset

MIcrosoft News Dataset (MIND) is a large-scale dataset for news recommendation research to serve as a benchmark dataset for news recommendation and facilitate the research in news recommendation and recommender systems area.

NAIP

Aerial imagery from the National Agricultural Imagery Program (NAIP), which provides US-wide high-resolution aerial imagery.

MODIS

Satellite imagery from the Moderate Resolution Imaging Spectroradiometer (MODIS), which has imaged the Earth every 1-2 days since 1999.

US Population by County

US population by gender and race for each US county sourced from 2000 and 2010 Decennial Census. This dataset is sourced from the United States Census Bureau.

US Population by ZIP Code

US population by gender and race for each US ZIP code sourced from 2010 Decennial Census. This dataset is sourced from the United States Census Bureau.

UK Met Office Global Weather Data for COVID-19 Analysis

UK Met Office global weather dataset for researchers to explore relationships between COVID-19 incidence and environmental factors.

Public Holidays

Worldwide public holiday data sourced from PyPI holidays package and Wikipedia, covering 38 countries or regions from 1970 to 2099.

Russian Open Speech To Text

Russain Open STT is a large-scale open speech to text dataset for the Russian language

NYC Taxi & Limousine Commission - green taxi trip records

The green taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

NYC Taxi & Limousine Commission - yellow taxi trip records

The yellow taxi trip records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.

US National Employment Hours and Earnings

The Current Employment Statistics (CES) program produces detailed industry estimates of nonfarm employment, hours, and earnings of workers on payrolls in the United States.

NASADEM

NASADEM provides global topographic data derived primarily from data captured NASA’s Shuttle Radar Topography Mission.

GOES-16

GOES-16 provides weather imagery from NOAA’s GOES-16 satellite.

NOAA Integrated Surface Data (ISD)

NOAA Integrated Surface Data (ISD) provides Worldwide hourly weather history data sourced from the National Oceanic and Atmospheric Administration (NOAA).

NYC Taxi & Limousine Commission - For-Hire Vehicle (FHV) trip records

The For-Hire Vehicle trip records include fields capturing the dispatching base license number and the pick-up date, time, and taxi zone location ID.

NOAA NEXRAD Level II

Recent level II data from NEXRAD, a network of 159 weather radar stations distributed across the United States.

NOAA Global Forecast System (GFS)

15-day US hourly weather forecast data produced by the Global Forecast System (GFS) from the National Oceanic and Atmospheric Administration (NOAA).

NOAA Global Hydro Estimator (GHE)

The Global Hydro Estimator (GHE) dataset provides global rainfall estimates in 15-minute intervals derived from satellite imagery and data from NOAA’s Global Forecast System.

Machine Learning Samples

A collection of different types of machine learning datasets such as tabular datasets, timeseries datasets, images, text and more.

US Producer Price Index - Industry

The Producer Price Index (PPI) is a measure of average change over time in the selling prices received by domestic producers for their output.

US Labor Force Statistics

US Labor Force Statistics provides Labor Force Statistics, labor force participation rates, and the civilian noninstitutional population by age, gender, race, and ethnic groups. in the United States.

US Producer Price Index - Commodities

The Producer Price Index (PPI) is a measure of average change over time in the selling prices received by domestic producers for their commodities.

US Local Area Unemployment Statistics

The US Local Area Unemployment Statistics datasets provides monthly and annual employment, unemployment, and labor force data for Census regions and divisions, States, counties, metropolitan areas, and many cities in the United States.

US State Employment Hours and Earnings

The Current Employment Statistics (CES) program produces detailed industry estimates of nonfarm employment, hours, and earnings of workers on payrolls in the United States.

US Consumer Price Index

The Consumer Price Index (CPI) is a measure of the average change over time in the prices paid by urban consumers for a market basket of consumer goods and services.

Harmonized Landsat Sentinel-2

The Harmonized Landsat Sentinel-2 (HLS) dataset includes satellite imagery data from the Landsat-8 (2013 to present) and Sentinel-2 (2015 to present) satellites, aligned to a common grid and processed to compatible color spaces.

Genomics Data Lake

The Genomics Data Lake provides a variety of public datasets that you can access for free and integrate into your genomics analysis workflows and applications. The datasets include genome sequences, variant info and subject/sample metadata in BAM, FASTA, VCF, CSV file formats.

COVID-19 Data Lake

COVID-19 Data Lake collection is a collection of COVID-19 related datasets from various sources, covering testing and patient outcome tracking data, social distancing policy, hospital capacity, mobility, etc.

Daymet

Gridded estimates of daily weather parameters in North America from meteorological observations.

COVID-19 Open Research Dataset

A full-text and metadata dataset of COVID-19 and coronavirus-related scholarly articles optimized for machine readability and made available for use by the global research community.

Seattle Safety Data

Seattle Fire Department 911 dispatches. This dataset is updated daily, and contains historical records accumulated from 2010 to the present

San Francisco Safety Data

Fire department calls for service and 311 cases in San Francisco. This dataset contains historical records accumulated from 2015 to the present.

New York City Safety Data

This dataset contains all New York City 311 service requests from 2010 to the present. It’s stored in Parquet format and updated daily.

Chicago Safety Data

Read data about 311 calls reported to the city of Chicago. This dataset is stored in Parquet format and is updated daily.

Boston Safety Data

Read data about 311 calls reported to the city of Boston. This dataset is stored in Parquet format and is updated daily.

Can't find the data? Email us to request a dataset or contribute a dataset