Dati cronologici per le previsioni metereologiche orarie in tutto il mondo (ad esempio, temperatura, precipitazioni, vento) generati da National Oceanic and Atmospheric Administration (NOAA).
Il set di dati Integrated Surface Dataset (ISD) è costituito da osservazioni metereologiche in superficie in tutto il mondo provenienti da più di 35.000 stazioni, benché la copertura spaziale migliore sia evidente nelle aree America del Nord, Europa, Australia e in parte dell’Asia. I parametri inclusi sono: qualità dell’aria, pressione atmosferica, temperatura atmosferica/punto di rugiada, venti atmosferici, nuvole, precipitazioni, onde oceaniche, maree e altro ancora. ISD fa riferimento ai dati inclusi nel database digitale, oltre al formato in cui sono archiviate le osservazioni metereologiche orarie, sinottiche (3 ore) e giornaliere.
Volume e conservazione
Il set di dati è archiviato nel formato Parquet. Viene aggiornato ogni giorno e contiene circa 400 milioni di righe (20 GB) in totale a oggi (2019).
Questo set di dati include record cronologici accumulati dal 2008 a oggi. Puoi usare le impostazioni dei parametri nell’SDK per recuperare i dati entro un intervallo di tempo specifico.
Posizione di archiviazione
Questo set di dati è archiviato nell’area Stati Uniti orientali di Azure. L’allocazione delle risorse di calcolo nell’area Stati Uniti orientali è consigliata per motivi di affinità.
Informazioni aggiuntive
Il set di dati viene generato da NOAA Integrated Surface Database. Informazioni aggiuntive su questo set di dati sono disponibili qui e qui. Invia un messaggio di posta elettronica a ncei.orders@noaa.gov se hai domande sull’origine dati.
Notifiche
MICROSOFT FORNISCE I SET DI DATI APERTI DI AZURE “COSÌ COME SONO”. MICROSOFT NON OFFRE ALCUNA GARANZIA O CONDIZIONE ESPLICITA O IMPLICITA RELATIVAMENTE ALL’USO DEI SET DI DATI DA PARTE DELL’UTENTE. NELLA MISURA MASSIMA CONSENTITA DALLE LEGGI LOCALI, MICROSOFT NON RICONOSCE ALCUNA RESPONSABILITÀ RELATIVAMENTE A DANNI O PERDITE COMMERCIALI, INCLUSI I DANNI DIRETTI, CONSEQUENZIALI, SPECIALI, INDIRETTI, INCIDENTALI O PUNITIVI DERIVANTI DALL’USO DEI SET DI DATI DA PARTE DELL’UTENTE.
Questo set di dati viene fornito in conformità con le condizioni originali in base alle quali Microsoft ha ricevuto i dati di origine. Il set di dati potrebbe includere dati provenienti da Microsoft.
Access
Available in | When to use |
---|---|
Azure Notebooks | Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine. |
Azure Databricks | Use this when you need the scale of an Azure managed Spark cluster to process the dataset. |
Azure Synapse | Use this when you need the scale of an Azure managed Spark cluster to process the dataset. |
Preview
usaf | wban | datetime | latitude | longitude | elevation | cloudCoverage | stationName | countryOrRegion | p_k | year | day | version | month |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
789700 | 11634 | 2/27/2021 7:59:00 PM | 10.583 | -61.35 | 12 | null | PIARCO INTL AP | TD | 789700-11634 | 2021 | 27 | 1 | 2 |
479710 | 42402 | 2/27/2021 2:59:00 PM | 27.1 | 142.183 | 8 | null | CHICHIJIMA | JA | 479710-42402 | 2021 | 27 | 1 | 2 |
479310 | 42204 | 2/27/2021 2:59:00 PM | 26.356 | 127.768 | 44 | null | KADENA AB | JA | 479310-42204 | 2021 | 27 | 1 | 2 |
912320 | 41418 | 2/27/2021 1:59:00 PM | 15.117 | 145.717 | 64 | null | FRANCISCO C. ADA/SAIPAN INTERNATIONAL ARPT | CQ | 912320-41418 | 2021 | 27 | 1 | 2 |
912180 | 41414 | 2/27/2021 1:59:00 PM | 13.583 | 144.917 | 187 | null | ANDERSEN AFB AIRPORT | GQ | 912180-41414 | 2021 | 27 | 1 | 2 |
912460 | 41606 | 2/27/2021 11:59:00 AM | 19.283 | 166.65 | 4 | null | WAKE ISLAND AIRFIELD | WQ | 912460-41606 | 2021 | 27 | 1 | 2 |
934360 | 00488 | 2/27/2021 11:59:00 AM | -41.327 | 174.805 | 12 | null | WELLINGTON INTL | NZ | 934360-00488 | 2021 | 27 | 1 | 2 |
704540 | 25704 | 2/27/2021 9:59:00 AM | 51.883 | -176.65 | 6 | null | ADAK NAS | US | 704540-25704 | 2021 | 27 | 1 | 2 |
912850 | 21504 | 2/27/2021 9:59:00 AM | 19.721 | -155.048 | 12 | null | HILO INTERNATIONAL AIRPORT | US | 912850-21504 | 2021 | 27 | 1 | 2 |
911700 | 22508 | 2/27/2021 9:59:00 AM | 21.487 | -158.028 | 255 | null | WHEELER ARMY AIRFIELD | US | 911700-22508 | 2021 | 27 | 1 | 2 |
Name | Data type | Unique | Values (sample) | Description |
---|---|---|---|---|
cloudCoverage | string | 8 | CLR OVC |
Frazione del cielo coperta da tutte le nuvole visibili. Valori della copertura nuvolosa: CLR = Clear skies
FEW = Few clouds
SCT = Scattered clouds
BKN = Broken cloud cover
OVC = Overcast
OBS = Sky is obscured/can't be estimated
POBS = Sky is partially obscured
|
countryOrRegion | string | 245 | US CA |
Codice del paese. |
datetime | timestamp | 6,920,272 | 2018-01-18 12:00:00 2018-02-28 12:00:00 |
Data e ora in formato UTC di un’OSSERVAZIONE-PUNTO-GEOFISICO. |
day | int | 31 | 1 6 |
Giorno della colonna datetime. |
elevation | double | 2,369 | 5.0 3.0 |
Elevazione del valore PUNTO-OSSERVAZIONE-GEOFISICO rispetto al livello medio del mare. |
latitude | double | 34,854 | 38.544 31.78 |
Latitudine di un’OSSERVAZIONE-PUNTO-GEOFISICO in cui l’emisfero meridionale è negativo. |
longitude | double | 58,179 | -86.0 -96.622 |
Longitudine di PUNTO-OSSERVAZIONE-GEOFISICO in cui i valori a occidente rispetto all’intervallo compreso tra 000000 e 179999 vengono contrassegnati come negativi. |
month | int | 12 | 1 12 |
Mese della colonna datetime. |
p_k | string | 17,415 | 999999-53131 999999-22016 |
usaf-wban |
pastWeatherIndicator | int | 11 | 2 6 |
Recupera l’indicatore del meteo passato, che mostra il meteo nell’ora passata 0: Cloud covering 1/2 or less of the sky throughout the appropriate period
1: Cloud covering more than 1/2 of the sky during part of the appropriate period and covering 1/2 or less during part of the period
2: Cloud covering more than 1/2 of the sky throughout the appropriate period
3: Sandstorm, duststorm or blowing snow
4: Fog or ice fog or thick haze
5: Drizzle
6: Rain
7: Snow, or rain and snow mixed
8: Shower(s)
9: Thunderstorm(s) with or without precipitation
|
precipDepth | double | 5,671 | 9999.0 3.0 |
Profondità delle PRECIPITAZIONI LIQUIDE misurata al momento di un’osservazione. Unità: millimetri. MIN: 0000; MAX: 9998; 9999 = Mancante; FATTORE DI SCALA: 10. |
precipTime | double | 44 | 1.0 24.0 |
Quantità di tempo rispetto a cui è stata calcolata la PRECIPITAZIONE-LIQUIDA. Unità: ore. MIN: 00; MAX: 98; 99 = mancante. |
presentWeatherIndicator | int | 101 | 10 5 |
Recupera l’indicatore del meteo corrente, che mostra il meteo nell’ora corrente 00: Cloud development not observed or not observable
01: Clouds generally dissolving or becoming less developed
02: State of sky on the whole unchanged
03: Clouds generally forming or developing
04: Visibility reduced by smoke, e.g. veldt or forest fires, industrial smoke or volcanic ashes
05: Haze
06: Widespread dust in suspension in the air, not raised by wind at or near the station at the time of observation
07: Dust or sand raised by wind at or near the station at the time of observation, but no well-developed dust whirl(s) sand whirl(s), and no duststorm or sandstorm seen or, in the case of ships, blowing spray at the station
08: Well developed dust whirl(s) or sand whirl(s) seen at or near the station during the preceding hour or at the time of observation, but no duststorm or sandstorm
09: Duststorm or sandstorm within sight at the time of observation, or at the station during the preceding hour
For more: The section 'MW1' in ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-format-document.pdf
|
seaLvlPressure | double | 2,214 | 1015.0 1014.2 |
Pressione dell’aria rispetto al livello medio del mare. MIN: 08600 MAX: 10900 UNITÀ: hectopascal |
snowDepth | double | 652 | 1.0 3.0 |
Profondità di neve e ghiaccio sul suolo. MIN: 0000 MAX: 1200 UNITÀ: centimetri |
stationName | string | 16,677 | TUCSON 11 W PANTHER JUNCTION 2 N |
Nome della stazione meteorologica. |
temperature | double | 1,467 | 15.0 13.0 |
Temperatura dell’aria. MIN: -0932 MAX: +0618 UNITÀ: Gradi Celsius |
usaf | string | 16,732 | 999999 062350 |
Numero di stazione di AIR FORCE CATALOG. |
version | double | 1 | 1.0 | |
wban | string | 2,556 | 99999 53131 |
Numero NCDC WBAN. |
windAngle | int | 362 | 180 270 |
Angolo, misurato in senso orario, tra il nord geografico e la direzione da cui soffia il vento. MIN: 001 MAX: 360 UNITÀ: Gradi angolari |
windSpeed | double | 621 | 2.1 1.5 |
Velocità dello spostamento orizzontale dell’aria oltre un punto fisso. MIN: 0000 MAX: 0900 UNITÀ: metri al secondo |
year | int | 14 | 2019 2020 |
Anno della colonna datetime. |
Azure Notebooks
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather
from datetime import datetime
from dateutil.relativedelta import relativedelta
end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
# Get historical weather data in the past month.
isd = NoaaIsdWeather(start_date, end_date)
# Read into Pandas data frame.
isd_df = isd.to_pandas_dataframe()
isd_df.info()
# Pip install packages
import os, sys
!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "isdweatherdatacontainer"
folder_name = "ISDWeather/"
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient
if azure_storage_account_name is None or azure_storage_sas_token is None:
raise Exception(
"Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")
print('Looking for the first parquet under the folder ' +
folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
container_url, azure_storage_sas_token if azure_storage_sas_token else None)
container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
targetBlobName = blob.name
break
print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
blob_client.download_blob().download_to_stream(local_file)
# Read the parquet file into Pandas data frame
import pandas as pd
print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
Azure Databricks
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NoaaIsdWeather
from datetime import datetime
from dateutil.relativedelta import relativedelta
end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
display(isd_df.limit(5))
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
blob_sas_token)
print('Remote blob path: ' + wasbs_path)
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))
Azure Synapse
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather
from datetime import datetime
from dateutil.relativedelta import relativedelta
end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
# Display top 5 rows
display(isd_df.limit(5))
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
blob_sas_token)
print('Remote blob path: ' + wasbs_path)
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Urban Heat Islands
From the Urban Innovation Initiative at Microsoft Research, data processing and analytics scripts for hourly NOAA weather station data that produce daily urban heat island indices for hundreds of U.S. cities, January 1, 2008 - present, including automated daily updating. Urban heat island effects are then examined over time and across cities, as well as aligned with population density.