Spring over navigation

NOAA Global Forecast System (GFS)

Weather GFS NOAA

15-dages vejrhistorikdata hver time for USA (eksempel: temperatur, nedbør og vind), der er produceret af GFS (Global Forecast System) fra NOAA (National Oceanic and Atmospheric Administration).

GFS (Global Forecast System) er en model til vejrudsigter, der er produceret af NCEP (National Centers for Environmental Prediction). Dette datasæt giver adgang til en masse atmosfæriske variabler og jordbundsvariabler, fra temperaturer, vind og nedbør til jordbundsfugtighed og atmosfærisk ozonkoncentration. Hele jorden er dækket af GFS ved en vandret basisopløsning på 28 kilometer mellem gitterpunkter, der bruges af meteorologer, der forudsiger vejret i de kommende 16 dage. Den vandrette opløsning falder til 70 kilometer mellem gitterpunkterne for vejrudsigter mellem en og to uger. Dette datasæt er specifikt hentet fra GFS4.

Mængde og opbevaring

Dette datasæt gemmes i Parquet-formatet. Det opdateres dagligt med 15-dages fremadrettede prognosedata. Der er ca. 9.000.000.000 rækker (200 GB) i alt fra og med 2019.

Dette datasæt indeholder historiske poster, der er akkumuleret fra december 2018 og frem til i dag. Du kan bruge parameterindstillingerne i vores SDK til at hente data inden for en bestemt tidsperiode.

Lagerplacering

Dette datasæt er gemt i Azure-området Det østlige USA. Tildeling af beregningsressourcer i det østlige USA anbefales af tilhørsmæssige årsager.

Yderligere oplysninger

Dette datasæt stammer fra NOAA Global Forecast System. Du kan få yderligere oplysninger om datasættet her og her. Send en mail til , hvis du har nogle spørgsmål til datakilden.

Meddelelser

MICROSOFT STILLER AZURE OPEN DATASETS TIL RÅDIGHED, SOM DE ER OG FOREFINDES. MICROSOFT FRASKRIVER SIG ETHVERT ANSVAR, UDTRYKKELIGT ELLER STILTIENDE, OG GARANTIER ELLER BETINGELSER MED HENSYN TIL BRUGEN AF DATASÆTTENE. I DET OMFANG DET ER TILLADT I HENHOLD TIL GÆLDENDE LOVGIVNING FRASKRIVER MICROSOFT SIG ETHVERT ANSVAR FOR SKADER ELLER TAB, INKLUSIVE DIREKTE, FØLGESKADER, SÆRLIGE SKADER, INDIREKTE SKADER, HÆNDELIGE SKADER ELLER PONALE SKADER, DER MÅTTE OPSTÅ I FORBINDELSE MED BRUG AF DATASÆTTENE.

Dette datasæt stilles til rådighed under de oprindelige vilkår, som Microsoft modtog kildedataene under. Datasættet kan indeholde data fra Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

currentDatetime forecastHour latitude longitude precipitableWaterEntireAtmosphere seaLvlPressure temperature windSpeedGustSurface totalCloudCoverConvectiveCloud year month day
10/21/2020 6:00:00 PM 96 -45 90 7.2 98232.9 281.49 21.0422 37 2020 10 21
10/21/2020 6:00:00 PM 96 -45 94.5 9.1 98060.1 281.89 17.5422 50 2020 10 21
10/21/2020 6:00:00 PM 96 -45 90.5 7.1 98212.1 281.49 20.5422 43 2020 10 21
10/21/2020 6:00:00 PM 96 -45 91 7.6 98202.5 281.59 20.0422 50 2020 10 21
10/21/2020 6:00:00 PM 96 -45 91.5 8.5 98189.7 281.69 19.2422 52 2020 10 21
10/21/2020 6:00:00 PM 96 -45 92 8.6 98165.7 281.99 18.6422 51 2020 10 21
10/21/2020 6:00:00 PM 96 -45 92.5 8.7 98143.3 282.19 18.7422 51 2020 10 21
10/21/2020 6:00:00 PM 96 -45 93 8.8 98120.9 282.49 18.8422 50 2020 10 21
10/21/2020 6:00:00 PM 96 -45 93.5 8.9 98098.5 282.59 18.3422 51 2020 10 21
10/21/2020 6:00:00 PM 96 -45 94 9 98077.7 282.19 17.8422 51 2020 10 21
Name Data type Unique Values (sample) Description
currentDatetime timestamp 2,420 2018-12-07 06:00:00
2018-12-05 06:00:00

Runtime for cyklussen for prognosemodellen.

day int 31 1
5

Dag for currentDatetime.

forecastHour int 129 336
102

En time siden Current_Datetime, prognose- eller observationstidspunktet.

latitude double 361 46.0
-34.5

Breddegrad, degrees_north.

longitude double 1,079 97.0
109.5

Længdegrad, degrees_east.

month int 12 12
9

Måned for currentDatetime.

precipitableWaterEntireAtmosphere double 5,016,154 0.5
0.2

Nedbør som regn i hele det atmosfæriske lag. Enheder: kg/m2

seaLvlPressure double 8,560,320 101120.0
101088.0

Tryk ved jord- eller havoverfladen. Enheder: Pa

snowDepthSurface double 1,119 nan
1.0

Snedybde ved jord- eller havoverfladen. Enheder: m

temperature double 5,840,553 273.0
273.1

Temperatur ved jord- eller havoverfladen. Enheder: K

totalCloudCoverConvectiveCloud double 82 1.0
2.0

Samlet skydække på konvektivt skylag. Enheder: %

windSpeedGustSurface double 19,008,240 4.5
5.0

Vindhastighed (vindstød) ved jord- eller havoverfladen. Enheder: m/s

year int 4 2019
2020

År for currentDatetime.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NoaaGfsWeather

from dateutil import parser


start_date = parser.parse('2018-12-20')
end_date = parser.parse('2018-12-21')
gfs = NoaaGfsWeather(start_date, end_date)
gfs_df = gfs.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe Due to size, we only allow getting 1-day data into pandas dataframe! We are taking the latest day: /year=2018/month=12/day=21/ Target paths: ['/year=2018/month=12/day=21/'] Looking for parquet files... Reading them into Pandas dataframe... Reading GFSWeather/GFSProcessed/year=2018/month=12/day=21/part-00000-tid-570650763889113128-ff3109d0-23cf-4024-a096-63964952b0c7-4397-c000.snappy.parquet under container gfsweatherdatacontainer Reading GFSWeather/GFSProcessed/year=2018/month=12/day=21/part-00001-tid-570650763889113128-ff3109d0-23cf-4024-a096-63964952b0c7-4398-c000.snappy.parquet under container gfsweatherdatacontainer ... Reading GFSWeather/GFSProcessed/year=2018/month=12/day=21/part-00199-tid-570650763889113128-ff3109d0-23cf-4024-a096-63964952b0c7-4596-c000.snappy.parquet under container gfsweatherdatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=91914.45 [ms]
In [2]:
gfs_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 24172560 entries, 0 to 120634 Data columns (total 10 columns): currentDatetime datetime64[ns] forecastHour int32 latitude float64 longitude float64 precipitableWaterEntireAtmosphere float64 seaLvlPressure float64 snowDepthSurface float64 temperature float64 windSpeedGustSurface float64 totalCloudCoverConvectiveCloud float64 dtypes: datetime64[ns](1), float64(8), int32(1) memory usage: 1.9 GB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "gfsweatherdatacontainer"
folder_name = "GFSWeather/GFSProcessed"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NoaaGfsWeather

from dateutil import parser


start_date = parser.parse('2018-12-20')
end_date = parser.parse('2018-12-21')
gfs = NoaaGfsWeather(start_date, end_date)
gfs_df = gfs.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=92636.3 [ms]
In [2]:
display(gfs_df.limit(5))
currentDatetimeforecastHourlatitudelongitudeprecipitableWaterEntireAtmosphereseaLvlPressuresnowDepthSurfacetemperaturewindSpeedGustSurfacetotalCloudCoverConvectiveCloudyearmonthday
2018-12-20T00:00:00.000+00000-90.079.03.54831433296203671160.97656251.0099999904632568260.606781005859412.820813179016113null20181220
2018-12-20T00:00:00.000+00000-90.0268.03.54831433296203671160.97656251.0099999904632568260.606781005859412.820813179016113null20181220
2018-12-20T00:00:00.000+00000-89.536.53.448314189910888770757.77343751.0099999904632568258.606781005859412.620813369750977null20181220
2018-12-20T00:00:00.000+00000-89.543.03.348314285278320370597.77343751.0099999904632568258.306793212890612.720812797546387null20181220
2018-12-20T00:00:00.000+00000-89.5144.03.24831438064575269701.77343751.0099999904632568259.5067749023437512.620813369750977null20181220
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "gfsweatherdatacontainer"
blob_relative_path = "GFSWeather/GFSProcessed"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [23]:
# This is a package in preview.
from azureml.opendatasets import NoaaGfsWeather

from dateutil import parser


start_date = parser.parse('2018-12-20')
end_date = parser.parse('2018-12-21')
gfs = NoaaGfsWeather(start_date, end_date)
gfs_df = gfs.to_spark_dataframe()
In [24]:
# Display top 5 rows
display(gfs_df.limit(5))
Out[24]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "gfsweatherdatacontainer"
blob_relative_path = "GFSWeather/GFSProcessed"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))