Hopp over navigasjon

NOAA Global Forecast System (GFS)

Weather GFS NOAA

15-dagers timevarsel for været i USA (eksempel: temperatur, nedbør, vind) produsert av Global Forecast System (GFS) fra National Oceanic and Atmospheric Administration (NOAA).

Global Forecast System (GFS) er en værmeldingsmodell produsert av National Centers for Environmental Prediction (NCEP). En mengde atmosfæriske og landjordvariabler er tilgjengelige gjennom dette datasettet, fra temperaturer, vind og nedbør til jordfuktighet og ozonkonsentrasjon i atmosfæren. Hele jorden er dekket av GPS med en horisontal oppløsning på 28 kilometer mellom rutenettpunkter, som brukes av prognosemakerne som spår været opp til 16 dager i fremtiden. Horisontal oppløsning faller til 70 kilometer mellom rutenettpunkt for prognoser mellom en uke og to uker. Dette datasettet er spesielt hentet fra GFS4.

Volum og dataoppbevaring

Dette datasettet er lagret i Parquet-format. Det oppdateres daglig med 15-dagers, foroverskuende prognosedata. Det er ca. 9B rader (200 GB) totalt fra og med 2019.

Dette datasettet inneholder historiske poster akkumulert fra desember 2018 til nå. Du kan bruke parameterinnstillinger i vårt SDK til å hente data innenfor et spesifikt tidsintervall.

Lagerplassering

Dette datasettet er lagret i Azure-området i øst-USA. Tildeling av databehandlingsressurser i øst-USA er anbefalt for affinitet.

Mer informasjon

Dette datasettet er hentet fra NOAA globalt prognosesystem. Du finner mer informasjon om datasettet her og her. Send en e-post hvis du har noen spørsmål om datakilden.

Merknader

MICROSOFT LEVERER AZURE OPEN DATASETS PÅ EN “SOM DE ER”-BASIS. MICROSOFT GIR INGEN GARANTIER, UTTRYKTE ELLER IMPLISERTE, ELLER BETINGELSER MED HENSYN TIL DIN BRUK AV DATASETTENE. I DEN GRAD LOKAL LOV TILLATER DET, FRASKRIVER MICROSOFT SEG ALT ANSVAR FOR EVENTUELLE SKADER ELLER TAP, INKLUDERT DIREKTE SKADE, FØLGESKADE, DOKUMENTERT ERSTATNINGSKRAV, INDIREKTE SKADE ELLER ERSTATNING UTOVER DET SOM VILLE VÆRE NORMALT, SOM FØLGE AV DIN BRUK AV DATASETTENE.

Dette datasettet leveres i henhold til de originale vilkårene Microsoft mottok kildedata. Datasettet kan inkludere data hentet fra Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

currentDatetime forecastHour latitude longitude precipitableWaterEntireAtmosphere seaLvlPressure temperature windSpeedGustSurface totalCloudCoverConvectiveCloud year month day
9/16/2020 6:00:00 PM 96 -45 90 9.7 99643.5 282.037 17.7186 38 2020 9 16
9/16/2020 6:00:00 PM 96 -45 94.5 9.9 99611.5 281.537 17.8186 14 2020 9 16
9/16/2020 6:00:00 PM 96 -45 90.5 11.3 99659.5 281.837 17.4186 41 2020 9 16
9/16/2020 6:00:00 PM 96 -45 91 10.8 99629.1 281.737 17.3186 38 2020 9 16
9/16/2020 6:00:00 PM 96 -45 91.5 10.5 99630.7 281.437 18.1186 41 2020 9 16
9/16/2020 6:00:00 PM 96 -45 92 11.6 99629.1 281.437 17.5186 41 2020 9 16
9/16/2020 6:00:00 PM 96 -45 92.5 11.1 99613.1 281.337 17.0186 36 2020 9 16
9/16/2020 6:00:00 PM 96 -45 93 10.8 99611.5 281.337 17.2186 25 2020 9 16
9/16/2020 6:00:00 PM 96 -45 93.5 10.4 99606.7 281.437 16.9186 10 2020 9 16
9/16/2020 6:00:00 PM 96 -45 94 9.9 99603.5 281.437 17.1186 6 2020 9 16
Name Data type Unique Values (sample) Description
currentDatetime timestamp 2,266 2019-01-08 06:00:00
2018-12-01 00:00:00

Prognosemodellsyklusens kjøretid.

day int 31 1
5

Dag for currentDatetime.

forecastHour int 129 336
102

Time siden currentDatetime, prognose eller observasjonstid.

latitude double 361 -4.0
-52.0

Breddegrad, degrees_north.

longitude double 1,079 6.5
47.0

Lengdegrad, degrees_east.

month int 12 12
8

Måned for currentDatetime.

precipitableWaterEntireAtmosphere double 4,984,685 0.5
0.20000000298023224

Vanninnhold i hele atmosfærelaget. Enheter: kg.m-2

seaLvlPressure double 8,558,967 101152.0
101132.0

Trykk på bakke eller vannoverflate. Enheter: Pa

snowDepthSurface double 1,119 nan
1.0

Snødybde på bakke eller vannoverflate. Enheter: m

temperature double 5,840,551 273.0
273.1

Temperatur på bakke eller vannoverflate. Enheter: K

totalCloudCoverConvectiveCloud double 82 1.0
2.0

Total skydekning i konvektivt skylag. Enheter: %

windSpeedGustSurface double 18,886,424 4.5
5.0

Vindhastighet (kast) bakke eller vannoverflate. Enhet: m/s

year int 4 2019
2020

År for currentDatetime.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NoaaGfsWeather

from dateutil import parser


start_date = parser.parse('2018-12-20')
end_date = parser.parse('2018-12-21')
gfs = NoaaGfsWeather(start_date, end_date)
gfs_df = gfs.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe Due to size, we only allow getting 1-day data into pandas dataframe! We are taking the latest day: /year=2018/month=12/day=21/ Target paths: ['/year=2018/month=12/day=21/'] Looking for parquet files... Reading them into Pandas dataframe... Reading GFSWeather/GFSProcessed/year=2018/month=12/day=21/part-00000-tid-570650763889113128-ff3109d0-23cf-4024-a096-63964952b0c7-4397-c000.snappy.parquet under container gfsweatherdatacontainer Reading GFSWeather/GFSProcessed/year=2018/month=12/day=21/part-00001-tid-570650763889113128-ff3109d0-23cf-4024-a096-63964952b0c7-4398-c000.snappy.parquet under container gfsweatherdatacontainer ... Reading GFSWeather/GFSProcessed/year=2018/month=12/day=21/part-00199-tid-570650763889113128-ff3109d0-23cf-4024-a096-63964952b0c7-4596-c000.snappy.parquet under container gfsweatherdatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=91914.45 [ms]
In [2]:
gfs_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 24172560 entries, 0 to 120634 Data columns (total 10 columns): currentDatetime datetime64[ns] forecastHour int32 latitude float64 longitude float64 precipitableWaterEntireAtmosphere float64 seaLvlPressure float64 snowDepthSurface float64 temperature float64 windSpeedGustSurface float64 totalCloudCoverConvectiveCloud float64 dtypes: datetime64[ns](1), float64(8), int32(1) memory usage: 1.9 GB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "gfsweatherdatacontainer"
folder_name = "GFSWeather/GFSProcessed"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NoaaGfsWeather

from dateutil import parser


start_date = parser.parse('2018-12-20')
end_date = parser.parse('2018-12-21')
gfs = NoaaGfsWeather(start_date, end_date)
gfs_df = gfs.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=92636.3 [ms]
In [2]:
display(gfs_df.limit(5))
currentDatetimeforecastHourlatitudelongitudeprecipitableWaterEntireAtmosphereseaLvlPressuresnowDepthSurfacetemperaturewindSpeedGustSurfacetotalCloudCoverConvectiveCloudyearmonthday
2018-12-20T00:00:00.000+00000-90.079.03.54831433296203671160.97656251.0099999904632568260.606781005859412.820813179016113null20181220
2018-12-20T00:00:00.000+00000-90.0268.03.54831433296203671160.97656251.0099999904632568260.606781005859412.820813179016113null20181220
2018-12-20T00:00:00.000+00000-89.536.53.448314189910888770757.77343751.0099999904632568258.606781005859412.620813369750977null20181220
2018-12-20T00:00:00.000+00000-89.543.03.348314285278320370597.77343751.0099999904632568258.306793212890612.720812797546387null20181220
2018-12-20T00:00:00.000+00000-89.5144.03.24831438064575269701.77343751.0099999904632568259.5067749023437512.620813369750977null20181220
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "gfsweatherdatacontainer"
blob_relative_path = "GFSWeather/GFSProcessed"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [23]:
# This is a package in preview.
from azureml.opendatasets import NoaaGfsWeather

from dateutil import parser


start_date = parser.parse('2018-12-20')
end_date = parser.parse('2018-12-21')
gfs = NoaaGfsWeather(start_date, end_date)
gfs_df = gfs.to_spark_dataframe()
In [24]:
# Display top 5 rows
display(gfs_df.limit(5))
Out[24]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "gfsweatherdatacontainer"
blob_relative_path = "GFSWeather/GFSProcessed"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))