Skip Navigation

NOAA Global Forecast System (GFS)

Weather GFS NOAA

15-day US hourly weather forecast data (example: temperature, precipitation, wind) produced by the Global Forecast System (GFS) from the National Oceanic and Atmospheric Administration (NOAA).

The Global Forecast System (GFS) is a weather forecast model produced by the National Centers for Environmental Prediction (NCEP). Dozens of atmospheric and land-soil variables are available through this dataset, from temperatures, winds, and precipitation to soil moisture and atmospheric ozone concentration. The entire globe is covered by the GFS at a base horizontal resolution of 18 miles (28 kilometers) between grid points, which is used by the operational forecasters who predict weather out to 16 days in the future. Horizontal resolution drops to 44 miles (70 kilometers) between grid point for forecasts between one week and two weeks. This dataset is specifically sourced from GFS4.

Volume and Retention

This dataset is stored in Parquet format. It is updated daily with 15-day, forward-looking forecast data. There are about 9B rows (200GB) in total as of 2019.

This dataset contains historical records accumulated from December 2018 to the present. You can use parameter settings in our SDK to fetch data within a specific time range.

Storage Location

This dataset is stored in the East US Azure region. Allocating compute resources in East US is recommended for affinity.

Additional Information

This dataset is sourced from NOAA Global Forecast System. Additional information about this dataset can be found here and here. Email if you have any questions about the data source.

Notices

MICROSOFT PROVIDES AZURE OPEN DATASETS ON AN “AS IS” BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INCLUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.

This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

currentDatetime forecastHour latitude longitude precipitableWaterEntireAtmosphere seaLvlPressure snowDepthSurface temperature windSpeedGustSurface totalCloudCoverConvectiveCloud year month day
5/15/2020 6:00:00 AM 15 -90 25 0.400000005960464 68831.3828125 1.0900000333786 231.699996948242 10.5314693450928 0 2020 5 15
5/15/2020 6:00:00 AM 15 -89.5 225 0.5 68368.984375 1.0900000333786 231.600006103516 11.0314693450928 0 2020 5 15
5/15/2020 6:00:00 AM 15 -89.5 296 0.5 69378.578125 1.0900000333786 232.100006103516 10.431468963623 0 2020 5 15
5/15/2020 6:00:00 AM 15 -90 91.5 0.400000005960464 68831.3828125 1.0900000333786 231.699996948242 10.5314693450928 0 2020 5 15
5/15/2020 6:00:00 AM 15 -90 218.5 0.400000005960464 68831.3828125 1.0900000333786 231.699996948242 10.5314693450928 0 2020 5 15
5/15/2020 6:00:00 AM 15 -90 223.5 0.400000005960464 68831.3828125 1.0900000333786 231.699996948242 10.5314693450928 0 2020 5 15
5/15/2020 6:00:00 AM 15 -90 350 0.400000005960464 68831.3828125 1.0900000333786 231.699996948242 10.5314693450928 0 2020 5 15
5/15/2020 6:00:00 AM 15 -89.5 111 0.400000005960464 67728.984375 1.08000004291534 230.899993896484 12.6314687728882 0 2020 5 15
5/15/2020 6:00:00 AM 15 -89.5 186.5 0.400000005960464 67685.78125 1.0900000333786 230.5 12.2314691543579 0 2020 5 15
5/15/2020 6:00:00 AM 15 -89.5 200 0.400000005960464 67962.578125 1.0900000333786 230.899993896484 11.2314691543579 0 2020 5 15
Name Data type Unique Values (sample) Description
currentDatetime timestamp 1,789 2018-12-09 00:00:00
2019-01-08 06:00:00

The forecast model cycle runtime.

day int 31 1
9

Day of currentDatetime.

forecastHour int 129 336
102

Hour since currentDatetime, forecast or observation time.

latitude double 361 -81.5
-22.5

Latitude, degrees_north.

longitude double 720 330.5
325.0

Longitude, degrees_east.

month int 12 12
1

Month of currentDatetime.

precipitableWaterEntireAtmosphere double 4,855,833 0.20000000298023224
0.30000001192092896

Precipitable water at entire atmosphere layer. Units: kg.m-2

seaLvlPressure double 8,153,154 101120.796875
101152.796875

Pressure at ground or water surface. Units: Pa

snowDepthSurface double 620 nan
1.0

Snow depth at ground or water surface. Units: m

temperature double 5,678,823 273.1000061035156
273.0

Temperature at ground or water surface. Units: K

totalCloudCoverConvectiveCloud double 82 1.0
2.0

Total cloud cover at convective cloud layer. Units: %

windSpeedGustSurface double 17,325,903 4.5
4.599999904632568

Wind speed (gust) at ground or water surface. Units: m/s

year int 4 2019
2018

Year of currentDatetime.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NoaaGfsWeather

from dateutil import parser


start_date = parser.parse('2018-12-20')
end_date = parser.parse('2018-12-21')
gfs = NoaaGfsWeather(start_date, end_date)
gfs_df = gfs.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe Due to size, we only allow getting 1-day data into pandas dataframe! We are taking the latest day: /year=2018/month=12/day=21/ Target paths: ['/year=2018/month=12/day=21/'] Looking for parquet files... Reading them into Pandas dataframe... Reading GFSWeather/GFSProcessed/year=2018/month=12/day=21/part-00000-tid-570650763889113128-ff3109d0-23cf-4024-a096-63964952b0c7-4397-c000.snappy.parquet under container gfsweatherdatacontainer Reading GFSWeather/GFSProcessed/year=2018/month=12/day=21/part-00001-tid-570650763889113128-ff3109d0-23cf-4024-a096-63964952b0c7-4398-c000.snappy.parquet under container gfsweatherdatacontainer ... Reading GFSWeather/GFSProcessed/year=2018/month=12/day=21/part-00199-tid-570650763889113128-ff3109d0-23cf-4024-a096-63964952b0c7-4596-c000.snappy.parquet under container gfsweatherdatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=91914.45 [ms]
In [2]:
gfs_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 24172560 entries, 0 to 120634 Data columns (total 10 columns): currentDatetime datetime64[ns] forecastHour int32 latitude float64 longitude float64 precipitableWaterEntireAtmosphere float64 seaLvlPressure float64 snowDepthSurface float64 temperature float64 windSpeedGustSurface float64 totalCloudCoverConvectiveCloud float64 dtypes: datetime64[ns](1), float64(8), int32(1) memory usage: 1.9 GB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "gfsweatherdatacontainer"
folder_name = "GFSWeather/GFSProcessed"
In [3]:
from azure.storage.blob import BlockBlobService

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception("Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' + folder_name + ' in container "' + container_name + '"...')
blob_service = BlockBlobService(account_name = azure_storage_account_name, sas_token = azure_storage_sas_token,)
blobs = blob_service.list_blobs(container_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName=''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
parquet_file=blob_service.get_blob_to_path(container_name, targetBlobName, filename)
In [4]:
# Read the local parquet file into Pandas data frame
import pyarrow.parquet as pq
import pandas as pd

appended_df = []
print('Reading the local parquet file into Pandas data frame')
df = pq.read_table(filename).to_pandas()
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NoaaGfsWeather

from dateutil import parser


start_date = parser.parse('2018-12-20')
end_date = parser.parse('2018-12-21')
gfs = NoaaGfsWeather(start_date, end_date)
gfs_df = gfs.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=92636.3 [ms]
In [2]:
display(gfs_df.limit(5))
currentDatetimeforecastHourlatitudelongitudeprecipitableWaterEntireAtmosphereseaLvlPressuresnowDepthSurfacetemperaturewindSpeedGustSurfacetotalCloudCoverConvectiveCloudyearmonthday
2018-12-20T00:00:00.000+00000-90.079.03.54831433296203671160.97656251.0099999904632568260.606781005859412.820813179016113null20181220
2018-12-20T00:00:00.000+00000-90.0268.03.54831433296203671160.97656251.0099999904632568260.606781005859412.820813179016113null20181220
2018-12-20T00:00:00.000+00000-89.536.53.448314189910888770757.77343751.0099999904632568258.606781005859412.620813369750977null20181220
2018-12-20T00:00:00.000+00000-89.543.03.348314285278320370597.77343751.0099999904632568258.306793212890612.720812797546387null20181220
2018-12-20T00:00:00.000+00000-89.5144.03.24831438064575269701.77343751.0099999904632568259.5067749023437512.620813369750977null20181220
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "gfsweatherdatacontainer"
blob_relative_path = "GFSWeather/GFSProcessed"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "gfsweatherdatacontainer"
blob_relative_path = "GFSWeather/GFSProcessed"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))