Ignorar navegação

NOAA Integrated Surface Data (ISD)

Weather ISD NOAA

Dados mundiais de histórico meteorológico por hora (por exemplo, temperatura, precipitação, vento) originados pela NOAA (Administração Oceânica e Atmosférica Nacional).

O ISD (Conjunto de Dados Integrados de Superfície) é composto por observações meteorológicas da superfície do globo obtidas em mais de 35.000 estações, embora a melhor cobertura espacial seja evidente na América do Norte, na Europa, na Austrália e em partes da Ásia. Os parâmetros incluídos são: qualidade do ar, pressão atmosférica, temperatura atmosférica/ponto de orvalho, ventos atmosféricos, nuvens, precipitação, ondas do mar, marés e muito mais. O ISD refere-se aos dados contidos no banco de dados digital, bem como ao formato no qual as observações meteorológicas horárias, sinópticas (3 horas) e diárias são armazenadas.

Volume e retenção

Este conjunto de dados está armazenado no formato Parquet. Ele é atualizado diariamente e contém cerca de 400 milhões de linhas (20 GB) no total desde 2019.

Este conjunto de dados contém registros históricos acumulados de 2008 até o presente. Você pode usar as configurações de parâmetro no nosso SDK para buscar dados em um intervalo de tempo específico.

Local de armazenamento

Este conjunto de dados está armazenado na região Leste dos EUA do Azure. É recomendável alocar recursos de computação no Leste dos EUA para afinidade.

Informações adicionais

Esse conjunto de dados é originado pelo Banco de Dados Integrados de Superfície da NOAA. Mais informações sobre este conjunto de dados podem ser encontradas aqui e aqui. Envie um email para em caso de dúvidas sobre a fonte de dados.

Avisos

A MICROSOFT FORNECE O AZURE OPEN DATASETS NO ESTADO EM QUE SE ENCONTRA. A MICROSOFT NÃO OFERECE GARANTIAS OU COBERTURAS, EXPRESSAS OU IMPLÍCITAS, EM RELAÇÃO AO USO DOS CONJUNTOS DE DADOS. ATÉ O LIMITE PERMITIDO PELA LEGISLAÇÃO LOCAL, A MICROSOFT SE EXIME DE TODA A RESPONSABILIDADE POR DANOS OU PERDAS, INCLUSIVE DIRETOS, CONSEQUENTES, ESPECIAIS, INDIRETOS, ACIDENTAIS OU PUNITIVOS, RESULTANTES DO USO DOS CONJUNTOS DE DADOS.

Esse conjunto de dados é fornecido de acordo com os termos originais com que a Microsoft recebeu os dados de origem. O conjunto de dados pode incluir dados originados da Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

usaf wban datetime latitude longitude elevation cloudCoverage stationName countryOrRegion p_k year day version month
726985 24242 9/27/2020 7:59:00 AM 45.551 -122.408 9 null PORTLAND-TROUTDALE AIRPORT US 726985-24242 2020 27 1 9
727855 24114 9/27/2020 7:59:00 AM 47.633 -117.65 750 null FAIRCHILD AIR FORCE BASE US 727855-24114 2020 27 1 9
726986 94261 9/27/2020 7:59:00 AM 45.541 -122.948 70 null PORTLAND-HILLSBORO AIRPORT US 726986-94261 2020 27 1 9
723910 93111 9/27/2020 7:59:00 AM 34.117 -119.116 4 null POINT MUGU US 723910-93111 2020 27 1 9
745056 53120 9/27/2020 7:59:00 AM 33.038 -116.915 425 null RAMONA AIRPORT US 745056-53120 2020 27 1 9
720646 00228 9/27/2020 7:59:00 AM 37.513 -122.501 20 null HALF MOON BAY AIRPORT US 720646-00228 2020 27 1 9
725975 24235 9/27/2020 7:59:00 AM 42.6 -123.364 1171 null SEXTON SUMMIT US 725975-24235 2020 27 1 9
720839 00279 9/27/2020 7:59:00 AM 39.667 -119.876 1540 null RENO STEAD AIRPORT US 720839-00279 2020 27 1 9
723894 03181 9/27/2020 7:59:00 AM 37.633 -118.85 2173 null MAMMOTH YOSEMITE AIRPORT US 723894-03181 2020 27 1 9
726950 24285 9/27/2020 7:59:00 AM 44.583 -124.05 49 null NEWPORT MUNICIPAL AIRPORT US 726950-24285 2020 27 1 9
Name Data type Unique Values (sample) Description
cloudCoverage string 8 CLR
OVC

A fração do céu coberta por todas as nuvens visíveis. Valores de cobertura de nuvem:

CLR = Clear skies FEW = Few clouds SCT = Scattered clouds BKN = Broken cloud cover OVC = Overcast OBS = Sky is obscured/can't be estimated POBS = Sky is partially obscured
countryOrRegion string 245 US
CA

Código de país ou região.

datetime timestamp 6,691,675 2018-02-19 12:00:00
2018-03-15 12:00:00

O datetime UTC de um GEOPHYSICAL-POINT-OBSERVATION.

day int 31 1
6

O dia do datetime da coluna.

elevation double 2,365 5.0
3.0

A elevação de um GEOPHYSICAL-POINT-OBSERVATION relacionado ao MSL (Nível Médio do Mar).

latitude double 34,717 38.544
31.78

A coordenada de latitude de um GEOPHYSICAL-POINT-OBSERVATION em que o hemisfério sul é negativo.

longitude double 58,037 -86.0
-96.622

A coordenada de longitude de um GEOPHYSICAL-POINT-OBSERVATION em que os valores a oeste de 000000 a 179999 têm sinal negativo.

month int 12 1
3

O mês da coluna datetime.

p_k string 17,303 999999-63855
999999-94074

usaf-wban

pastWeatherIndicator int 11 2
6

Recuperar indicador meteorológico passado, que mostra o clima na última hora

0: Cloud covering 1/2 or less of the sky throughout the appropriate period 1: Cloud covering more than 1/2 of the sky during part of the appropriate period and covering 1/2 or less during part of the period 2: Cloud covering more than 1/2 of the sky throughout the appropriate period 3: Sandstorm, duststorm or blowing snow 4: Fog or ice fog or thick haze 5: Drizzle 6: Rain 7: Snow, or rain and snow mixed 8: Shower(s) 9: Thunderstorm(s) with or without precipitation
precipDepth double 5,637 9999.0
3.0

A profundidade da PRECIPITAÇÃO LÍQUIDA que é medida no momento de uma observação. Unidades: milímetros. MÍN: 0000; MÁX: 9998; 9999 = Ausente; FATOR DE ESCALA: 10.

precipTime double 44 1.0
24.0

A quantidade de tempo durante o qual a PRECIPITAÇÃO LÍQUIDA foi medida. Unidades: Horas. MÍN: 00; MÁX: 98; 99 = Ausente.

presentWeatherIndicator int 101 10
5

Recuperar indicador meteorológico presente, que mostra o clima na hora atual

00: Cloud development not observed or not observable 01: Clouds generally dissolving or becoming less developed 02: State of sky on the whole unchanged 03: Clouds generally forming or developing 04: Visibility reduced by smoke, e.g. veldt or forest fires, industrial smoke or volcanic ashes 05: Haze 06: Widespread dust in suspension in the air, not raised by wind at or near the station at the time of observation 07: Dust or sand raised by wind at or near the station at the time of observation, but no well-developed dust whirl(s) sand whirl(s), and no duststorm or sandstorm seen or, in the case of ships, blowing spray at the station 08: Well developed dust whirl(s) or sand whirl(s) seen at or near the station during the preceding hour or at the time of observation, but no duststorm or sandstorm 09: Duststorm or sandstorm within sight at the time of observation, or at the station during the preceding hour For more: The section 'MW1' in ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-format-document.pdf
seaLvlPressure double 2,214 1015.0
1014.0

A pressão do ar em relação ao MSL (Nível Médio do Mar).

MÍN: 08600 MÁX: 10900 UNIDADES: Hectopascais

snowDepth double 652 1.0
3.0

A profundidade da neve e do gelo no chão. MÍN: 0000 MÁX: 1.200 UNIDADES: centímetros

stationName string 16,578 CROSSVILLE 7 NW
NUNN 7 NNE

Nome da estação meteorológica.

temperature double 1,467 15.0
13.0

A temperatura do ar. MÍN: -0932 MÁX: +0618 UNIDADES: Graus Celsius

usaf string 16,620 999999
062350

Número da estação no CATÁLOGO DA FORÇA AÉREA.

version double 1 1.0
wban string 2,555 99999
63855

Número WBAN da NCDC.

windAngle int 362 180
270

O ângulo, medido no sentido horário, entre o norte verdadeiro e a direção da qual o vento está soprando. MÍN: 001 MÁX: 360 UNIDADES: Graus angulares

windSpeed double 617 2.1
1.5

A taxa de deslocamento horizontal do ar após um ponto fixo.

MÍN: 0000 MÁX: 0900 UNIDADES: metros por segundo

year int 13 2019
2018

O ano do datetime da coluna.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)

# Get historical weather data in the past month.
isd = NoaaIsdWeather(start_date, end_date)
# Read into Pandas data frame.
isd_df = isd.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Target paths: ['/year=2019/month=6/'] Looking for parquet files... Reading them into Pandas dataframe... Reading ISDWeather/year=2019/month=6/part-00049-tid-7654660707407597606-ec55d6c6-0d34-4a97-b2c8-d201080c9a98-89240.c000.snappy.parquet under container isdweatherdatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=116905.15 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=116907.63 [ms]
In [2]:
isd_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7790719 entries, 2709 to 11337856 Data columns (total 22 columns): usaf object wban object datetime datetime64[ns] latitude float64 longitude float64 elevation float64 windAngle float64 windSpeed float64 temperature float64 seaLvlPressure float64 cloudCoverage object presentWeatherIndicator float64 pastWeatherIndicator float64 precipTime float64 precipDepth float64 snowDepth float64 stationName object countryOrRegion object p_k object year int32 day int32 version float64 dtypes: datetime64[ns](1), float64(13), int32(2), object(6) memory usage: 1.3+ GB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "isdweatherdatacontainer"
folder_name = "ISDWeather/"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=87171.59 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=87176.63 [ms]
In [2]:
display(isd_df.limit(5))
usafwbandatetimelatitudelongitudeelevationwindAnglewindSpeedtemperatureseaLvlPressurecloudCoveragepresentWeatherIndicatorpastWeatherIndicatorprecipTimeprecipDepthsnowDepthstationNamecountryOrRegionp_kyeardayversionmonth
726163547702019-06-30T21:38:00.000+000042.805-72.004317.0null2.617.2nullnull61null1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T21:52:00.000+000042.805-72.004317.0null1.517.21008.6nullnullnull1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T22:52:00.000+000042.805-72.004317.0null2.118.91008.8CLRnullnull1.00.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T23:52:00.000+000042.805-72.004317.0null1.518.31009.1FEWnullnull6.094.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
703260255032019-06-15T07:54:00.000+000058.683-156.65615.0704.110.01005.6null61null1.00.0nullKING SALMON AIRPORTUS703260-255032019151.06
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [25]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
In [26]:
# Display top 5 rows
display(isd_df.limit(5))
Out[26]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Urban Heat Islands

From the Urban Innovation Initiative at Microsoft Research, data processing and analytics scripts for hourly NOAA weather station data that produce daily urban heat island indices for hundreds of U.S. cities, January 1, 2008 - present, including automated daily updating. Urban heat island effects are then examined over time and across cities, as well as aligned with population density.