Spring over navigation

NOAA Integrated Surface Data (ISD)

Weather ISD NOAA

Verdensomspændende vejrhistorikdata hver time (eksempel: temperatur, nedbør og vind) fra NOAA (National Oceanic and Atmospheric Administration).

ISD-datasættet (Integrated Surface Dataset) består af verdensomspændende vejrobservationer ved jordens overflade fra over 35.000 stationer, hvor den bedste spatiale dækning er tydeligst i Nordamerika, Europa, Australien og dele af Asien. Parametre omfatter: luftkvalitet, atmosfærisk tryk, atmosfærisk temperatur/dugpunkt, atmosfæriske vinde, skyer, nedbør, havbølger, tidevand med mere. ISD refererer til dataene i den digitale database samt det format, som den timevise, synoptiske (3-timers) daglige vejrobservation er lagret i.

Mængde og opbevaring

Dette datasæt gemmes i Parquet-formatet. Det opdateres dagligt og indeholder omkring 400.000.000 rækker (20 GB) i alt fra og med 2019.

Dette datasæt indeholder historiske poster, der er akkumuleret fra 2008 og frem til i dag. Du kan bruge parameterindstillingerne i vores SDK til at hente data inden for en bestemt tidsperiode.

Lagerplacering

Dette datasæt er gemt i Azure-området Det østlige USA. Tildeling af beregningsressourcer i det østlige USA anbefales af tilhørsmæssige årsager.

Yderligere oplysninger

Dette datasæt stammer fra NOAA Integrated Surface Database. Du kan få yderligere oplysninger om datasættet her og her. Send en mail til , hvis du har nogle spørgsmål til datakilden.

Meddelelser

MICROSOFT STILLER AZURE OPEN DATASETS TIL RÅDIGHED, SOM DE ER OG FOREFINDES. MICROSOFT FRASKRIVER SIG ETHVERT ANSVAR, UDTRYKKELIGT ELLER STILTIENDE, OG GARANTIER ELLER BETINGELSER MED HENSYN TIL BRUGEN AF DATASÆTTENE. I DET OMFANG DET ER TILLADT I HENHOLD TIL GÆLDENDE LOVGIVNING FRASKRIVER MICROSOFT SIG ETHVERT ANSVAR FOR SKADER ELLER TAB, INKLUSIVE DIREKTE, FØLGESKADER, SÆRLIGE SKADER, INDIREKTE SKADER, HÆNDELIGE SKADER ELLER PONALE SKADER, DER MÅTTE OPSTÅ I FORBINDELSE MED BRUG AF DATASÆTTENE.

Dette datasæt stilles til rådighed under de oprindelige vilkår, som Microsoft modtog kildedataene under. Datasættet kan indeholde data fra Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

usaf wban datetime latitude longitude elevation cloudCoverage stationName countryOrRegion p_k year day version month
725744 24027 5/6/2021 6:59:00 AM 41.595 -109.052 2056 null RCK SRINGS-SWETWTER CO APT US 725744-24027 2021 6 1 5
740002 03042 5/6/2021 6:59:00 AM 37.5 -105.166 3114 null LA VETA PASS AWOS-3 ARPT US 740002-03042 2021 6 1 5
720385 00419 5/6/2021 6:59:00 AM 39.8 -105.766 4113 null BERTHOUD PASS US 720385-00419 2021 6 1 5
726710 24164 5/6/2021 6:59:00 AM 42.584 -110.107 2126 null BIG PINEY-MARBLETON ARPT US 726710-24164 2021 6 1 5
720345 94086 5/6/2021 6:59:00 AM 42.796 -109.806 2160 null RALPH WENZ FIELD AIRPORT US 720345-94086 2021 6 1 5
722061 03038 5/6/2021 6:59:00 AM 39.467 -106.15 3680 null COPPER MOUNTAIN US 722061-03038 2021 6 1 5
A00005 94076 5/6/2021 6:59:00 AM 40.054 -106.368 2259 null MC ELORY AIRFIELD AIRPORT US A00005-94076 2021 6 1 5
727755 24112 5/6/2021 6:59:00 AM 47.517 -111.183 1058 null MALMSTROM AFHP HELIPORT US 727755-24112 2021 6 1 5
723270 13897 5/6/2021 5:59:00 AM 36.119 -86.689 184 null NASHVILLE INTERNATIONAL AIRPORT US 723270-13897 2021 6 1 5
723761 23901 5/6/2021 5:59:00 AM 30.516 -96.704 119 null CALDWELL MUNICIPAL AIRPORT US 723761-23901 2021 6 1 5
Name Data type Unique Values (sample) Description
cloudCoverage string 8 CLR
OVC

Den del af himlen, der er dækket af alle de synlige skyer. Værdier for skydække:

CLR = Clear skies FEW = Few clouds SCT = Scattered clouds BKN = Broken cloud cover OVC = Overcast OBS = Sky is obscured/can't be estimated POBS = Sky is partially obscured
countryOrRegion string 245 US
CA

Kode for land eller område.

datetime timestamp 7,012,133 2019-10-30 12:00:00
2019-04-02 12:00:00

UTC-datetime for en GEOFYSISK PUNKTOBSERVATION.

day int 31 1
4

Dagen for kolonnens datetime.

elevation double 2,370 5.0
3.0

Højden for en GEOFYSISK PUNKTOBSERVATION i forhold til havoverfladen.

latitude double 34,895 38.544
31.78

Breddegradskoordinatet for en GEOFYSISK PUNKTOBSERVATION, hvor den sydlige halvkugle er negativ.

longitude double 58,248 -86.0
-96.622

Længdegradkoordinaterne for en GEOFYSISK PUNKTOBSERVATION, hvor værdierne vest for 000000 til 179999 signeres som negative.

month int 12 1
3

Måneden for kolonnens datetime.

p_k string 17,434 999999-21514
999999-04222

usaf-wban

pastWeatherIndicator int 11 2
6

Hent indikator for tidligere vejr, som viser vejret inden for den seneste time

0: Cloud covering 1/2 or less of the sky throughout the appropriate period 1: Cloud covering more than 1/2 of the sky during part of the appropriate period and covering 1/2 or less during part of the period 2: Cloud covering more than 1/2 of the sky throughout the appropriate period 3: Sandstorm, duststorm or blowing snow 4: Fog or ice fog or thick haze 5: Drizzle 6: Rain 7: Snow, or rain and snow mixed 8: Shower(s) 9: Thunderstorm(s) with or without precipitation
precipDepth double 5,681 9999.0
3.0

Den mængde NEDBØR, der måles på tidspunktet for en observation. Enheder: millimeter. MIN.: 0000; MAKS.: 9998; 9999 = mangler; SKALERINGSFAKTOR: 10.

precipTime double 44 1.0
24.0

Den mængde tid, hvor NEDBØRET blev målt. Enheder: timer. MIN.: 00; MAKS.: 98; 99 = manglende.

presentWeatherIndicator int 101 10
61

Hent indikator for aktuelt vejr, som viser vejret inden for indeværende time

00: Cloud development not observed or not observable 01: Clouds generally dissolving or becoming less developed 02: State of sky on the whole unchanged 03: Clouds generally forming or developing 04: Visibility reduced by smoke, e.g. veldt or forest fires, industrial smoke or volcanic ashes 05: Haze 06: Widespread dust in suspension in the air, not raised by wind at or near the station at the time of observation 07: Dust or sand raised by wind at or near the station at the time of observation, but no well-developed dust whirl(s) sand whirl(s), and no duststorm or sandstorm seen or, in the case of ships, blowing spray at the station 08: Well developed dust whirl(s) or sand whirl(s) seen at or near the station during the preceding hour or at the time of observation, but no duststorm or sandstorm 09: Duststorm or sandstorm within sight at the time of observation, or at the station during the preceding hour For more: The section 'MW1' in ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-format-document.pdf
seaLvlPressure double 2,214 1015.0
1014.0

Lufttrykket i forhold til havoverfladen.

MIN.: 08600 MAKS.: 10900 ENHEDER: Hektopascal

snowDepth double 652 1.0
3.0

Dybden af sne og is på jorden. MIN.: 0000 MAKS.: 1200 ENHEDER: centimeter

stationName string 16,694 MAUNA LOA 5 NNE
REDDING 12 WNW

Navn på vejrstation.

temperature double 1,469 15.0
13.0

Lufttemperaturen. MIN.: -0932 MAKS.: +0618 ENHEDER: grader Celsius

usaf string 16,751 999999
062350

Stationsnummer for AIR FORCE CATALOG.

version double 1 1.0
wban string 2,557 99999
21514

NCDC WBAN-nummer.

windAngle int 362 180
270

Vinklen målt i urets retning mellem stik nord og den retning, som vinden blæser fra. MIN.: 001 MAKS: 360 ENHEDER: vinkelgrader

windSpeed double 621 2.1
1.5

Luftens vandrette hastighed forbi et fast punkt.

MIN.: 0000 MAKS.: 0900 ENHEDER: meter pr. sekund

year int 14 2019
2020

Året for kolonnens datetime.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)

# Get historical weather data in the past month.
isd = NoaaIsdWeather(start_date, end_date)
# Read into Pandas data frame.
isd_df = isd.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Target paths: ['/year=2019/month=6/'] Looking for parquet files... Reading them into Pandas dataframe... Reading ISDWeather/year=2019/month=6/part-00049-tid-7654660707407597606-ec55d6c6-0d34-4a97-b2c8-d201080c9a98-89240.c000.snappy.parquet under container isdweatherdatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=116905.15 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=116907.63 [ms]
In [2]:
isd_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7790719 entries, 2709 to 11337856 Data columns (total 22 columns): usaf object wban object datetime datetime64[ns] latitude float64 longitude float64 elevation float64 windAngle float64 windSpeed float64 temperature float64 seaLvlPressure float64 cloudCoverage object presentWeatherIndicator float64 pastWeatherIndicator float64 precipTime float64 precipDepth float64 snowDepth float64 stationName object countryOrRegion object p_k object year int32 day int32 version float64 dtypes: datetime64[ns](1), float64(13), int32(2), object(6) memory usage: 1.3+ GB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "isdweatherdatacontainer"
folder_name = "ISDWeather/"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=87171.59 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=87176.63 [ms]
In [2]:
display(isd_df.limit(5))
usafwbandatetimelatitudelongitudeelevationwindAnglewindSpeedtemperatureseaLvlPressurecloudCoveragepresentWeatherIndicatorpastWeatherIndicatorprecipTimeprecipDepthsnowDepthstationNamecountryOrRegionp_kyeardayversionmonth
726163547702019-06-30T21:38:00.000+000042.805-72.004317.0null2.617.2nullnull61null1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T21:52:00.000+000042.805-72.004317.0null1.517.21008.6nullnullnull1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T22:52:00.000+000042.805-72.004317.0null2.118.91008.8CLRnullnull1.00.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T23:52:00.000+000042.805-72.004317.0null1.518.31009.1FEWnullnull6.094.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
703260255032019-06-15T07:54:00.000+000058.683-156.65615.0704.110.01005.6null61null1.00.0nullKING SALMON AIRPORTUS703260-255032019151.06
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [25]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
In [26]:
# Display top 5 rows
display(isd_df.limit(5))
Out[26]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Urban Heat Islands

From the Urban Innovation Initiative at Microsoft Research, data processing and analytics scripts for hourly NOAA weather station data that produce daily urban heat island indices for hundreds of U.S. cities, January 1, 2008 - present, including automated daily updating. Urban heat island effects are then examined over time and across cities, as well as aligned with population density.