Hoppa över navigering

NOAA Integrated Surface Data (ISD)

Weather ISD NOAA

Global väderhistorik en gång i timmen (temperatur, nederbörd, vind) som hämtas från National Oceanic and Atmospheric Administration (NOAA).

Integrated Surface Dataset (ISD) består av ytväderobservationer från hela världen från drygt 35 000 stationer, även om den bästa rumsliga täckningen är i Nordamerika, Europa, Australien och delar av Asien. Parametrar som ingår är: luftkvalitet, atmosfäriskt tryck, atmosfärisk temperatur/daggpunkt, atmosfäriska vindar, moln, nederbörd, havsvågor, tidvatten med mera. ISD refererar till data som finns i den digitala databasen, samt det format där de timvisa, synoptiska (var 3:e timme) och dagliga väderobservationerna lagras.

Volym och kvarhållning

Datamängden lagras i Parquet-format. Den uppdateras dagligen och innehåller cirka 400 miljoner rader (20 GB) sammanlagt 2019.

Datamängden innehåller historiska poster som ackumulerats från 2008 fram till nutid. Du kan använda parameterinställningar i vår SDK till att hämta data inom ett specifikt tidsintervall.

Lagringsplats

Datamängden lagras i Azure-regionen Östra USA. Vi rekommenderar att beräkningsresurser tilldelas i Östra USA av tillhörighetsskäl.

Ytterligare Information

Datamängden hämtas från NOAA:s Integrated Surface Database. Ytterligare information om denna datamängd finns här och här. Du kan skicka ett e-postmeddelande till om du har frågor om datakällan.

Meddelanden

MICROSOFT TILLHANDAHÅLLER AZURE OPEN DATASETS I BEFINTLIGT SKICK. MICROSOFT UTFÄRDAR INTE NÅGRA GARANTIER ELLER VILLKOR, UTTRYCKLIGA ELLER UNDERFÖRSTÅDDA, AVSEENDE ANVÄNDNINGEN AV DATAMÄNGDERNA. I DEN UTSTRÄCKNING DET ÄR TILLÅTET ENLIGT NATIONELL LAGSTIFTNING, FRISKRIVER MICROSOFT SIG FRÅN ALLT ANSVAR BETRÄFFANDE SKADOR OCH FÖRLUSTER, INKLUSIVE DIREKTA SKADOR, FÖLJDSKADOR, SÄRSKILDA SKADOR, INDIREKTA SKADOR, ELLER OFÖRUTSEDDA SKADOR FRÅN ANVÄNDNINGEN AV DATAMÄNGDERNA.

Datamängden tillhandahålls enligt de ursprungliga villkor som gällde när Microsoft tog emot källdatan. Datamängden kan innehålla data från Microsoft.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

usaf wban datetime latitude longitude elevation cloudCoverage stationName countryOrRegion p_k year day version month
727770 94012 10/21/2020 6:59:00 AM 48.543 -109.763 792 null HAVRE CITY-COUNTY AIRPORT US 727770-94012 2020 21 1 10
726798 24150 10/21/2020 6:59:00 AM 45.698 -110.44 1408 null MISSION FIELD AIRPORT US 726798-24150 2020 21 1 10
720871 00296 10/21/2020 6:59:00 AM 46.768 -100.894 593 null MANDAN MUNICIPAL AIRPORT US 720871-00296 2020 21 1 10
726777 94055 10/21/2020 6:59:00 AM 46.358 -104.25 906 null BAKER MUNICIPAL AIRPORT US 726777-94055 2020 21 1 10
720854 00282 10/21/2020 6:59:00 AM 46.925 -103.982 840 null BEACH AIRPORT US 720854-00282 2020 21 1 10
727584 94038 10/21/2020 6:59:00 AM 46.014 -102.654 824 null HETTINGER MUNICIPAL ARPT US 727584-94038 2020 21 1 10
726605 00386 10/21/2020 6:59:00 AM 44.483 -103.783 1198 null BLACK HILLS AIRPORT CLYDE ICE FIELD US 726605-00386 2020 21 1 10
720644 00226 10/21/2020 6:59:00 AM 33.417 -112.683 311 null BUCKEYE MUNICIPAL AIRPORT US 720644-00226 2020 21 1 10
727790 24146 10/21/2020 6:59:00 AM 48.304 -114.263 906 null GLACIER PARK INTERNATIONAL AIRPORT US 727790-24146 2020 21 1 10
727687 94028 10/21/2020 6:59:00 AM 47.717 -104.183 605 null SIDNEY-RICHLAND MUNI ARPT US 727687-94028 2020 21 1 10
Name Data type Unique Values (sample) Description
cloudCoverage string 8 CLR
OVC

En del av himlen täcks av alla synliga moln. Molntäckningsvärden:

CLR = Clear skies FEW = Few clouds SCT = Scattered clouds BKN = Broken cloud cover OVC = Overcast OBS = Sky is obscured/can't be estimated POBS = Sky is partially obscured
countryOrRegion string 245 US
CA

Land eller regionskod.

datetime timestamp 6,717,543 2018-04-16 12:00:00
2018-03-31 12:00:00

UTC-datetime för en GEOPHYSICAL-POINT-OBSERVATION.

day int 31 1
5

Dag i kolumnens datetime.

elevation double 2,367 5.0
3.0

Höjd för en GEOPHYSICAL-POINT-OBSERVATION relativt mot havsytans medelnivå (MSL).

latitude double 34,720 38.544
31.78

Latitudkoordinaten för en GEOPHYSICAL-POINT-OBSERVATION där södra hemisfären är negativ.

longitude double 58,048 -86.0
-96.622

Longitudkoordinaten för en GEOPHYSICAL-POINT-OBSERVATION där värden väster om 000000 till 179999 anges som negativa.

month int 12 1
3

Månad i kolumnens datetime.

p_k string 17,324 999999-54811
999999-04127

usaf-wban

pastWeatherIndicator int 11 2
6

Hämta den senaste väderindikatorn, som visar vädret under den senaste timmen

0: Cloud covering 1/2 or less of the sky throughout the appropriate period 1: Cloud covering more than 1/2 of the sky during part of the appropriate period and covering 1/2 or less during part of the period 2: Cloud covering more than 1/2 of the sky throughout the appropriate period 3: Sandstorm, duststorm or blowing snow 4: Fog or ice fog or thick haze 5: Drizzle 6: Rain 7: Snow, or rain and snow mixed 8: Shower(s) 9: Thunderstorm(s) with or without precipitation
precipDepth double 5,647 9999.0
3.0

Djupet för LIQUID-PRECIPITATION som uppmätts vid en observation. Enheter: millimeter. MIN: 0000; MAX: 9998; 9999 = Saknas; SKALNINGSFAKTOR: 10.

precipTime double 44 1.0
24.0

Den tidsperiod då LIQUID-PRECIPITATION uppmättes. Enheter: Timmar. MIN: 00; MAX: 98; 99 = Saknas.

presentWeatherIndicator int 101 10
5

Hämta aktuell väderindikator som visar vädret under innevarande timme

00: Cloud development not observed or not observable 01: Clouds generally dissolving or becoming less developed 02: State of sky on the whole unchanged 03: Clouds generally forming or developing 04: Visibility reduced by smoke, e.g. veldt or forest fires, industrial smoke or volcanic ashes 05: Haze 06: Widespread dust in suspension in the air, not raised by wind at or near the station at the time of observation 07: Dust or sand raised by wind at or near the station at the time of observation, but no well-developed dust whirl(s) sand whirl(s), and no duststorm or sandstorm seen or, in the case of ships, blowing spray at the station 08: Well developed dust whirl(s) or sand whirl(s) seen at or near the station during the preceding hour or at the time of observation, but no duststorm or sandstorm 09: Duststorm or sandstorm within sight at the time of observation, or at the station during the preceding hour For more: The section 'MW1' in ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-format-document.pdf
seaLvlPressure double 2,214 1015.0
1014.2

Lufttrycket relativt mot havsytans medelnivå (MSL).

MIN: 08600 MAX: 10900 ENHETER: Hektopascal

snowDepth double 652 1.0
3.0

Snödjup och is på marken. MIN: 0000 MAX: 1200 ENHETER: centimeter

stationName string 16,599 SHABBONA 5 NNE
MURPHY 10 W

Namn på väderstationen.

temperature double 1,467 15.0
13.0

Lufttemperatur. MIN: -0932 MAX: +0618 ENHETER: Grader Celsius

usaf string 16,641 999999
062350

Stationsnummer i FLYGVAPNETS KATALOG.

version double 1 1.0
wban string 2,555 99999
54811

NCDC WBAN-nummer.

windAngle int 362 180
270

Vinkeln (uppmätt medsols) mellan geografiskt norr och den riktning som vinden blåser från. MIN: 001 MAX: 360 ENHETER: Vinkelgrader

windSpeed double 618 2.1
1.5

Hastigheten för en horisontell luftström förbi en fast punkt.

MIN: 0000 MAX: 0900 ENHETER: meter per sekund

year int 13 2019
2018

År i kolumnens datetime.

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)

# Get historical weather data in the past month.
isd = NoaaIsdWeather(start_date, end_date)
# Read into Pandas data frame.
isd_df = isd.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Target paths: ['/year=2019/month=6/'] Looking for parquet files... Reading them into Pandas dataframe... Reading ISDWeather/year=2019/month=6/part-00049-tid-7654660707407597606-ec55d6c6-0d34-4a97-b2c8-d201080c9a98-89240.c000.snappy.parquet under container isdweatherdatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=116905.15 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=116907.63 [ms]
In [2]:
isd_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7790719 entries, 2709 to 11337856 Data columns (total 22 columns): usaf object wban object datetime datetime64[ns] latitude float64 longitude float64 elevation float64 windAngle float64 windSpeed float64 temperature float64 seaLvlPressure float64 cloudCoverage object presentWeatherIndicator float64 pastWeatherIndicator float64 precipTime float64 precipDepth float64 snowDepth float64 stationName object countryOrRegion object p_k object year int32 day int32 version float64 dtypes: datetime64[ns](1), float64(13), int32(2), object(6) memory usage: 1.3+ GB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "isdweatherdatacontainer"
folder_name = "ISDWeather/"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=87171.59 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=87176.63 [ms]
In [2]:
display(isd_df.limit(5))
usafwbandatetimelatitudelongitudeelevationwindAnglewindSpeedtemperatureseaLvlPressurecloudCoveragepresentWeatherIndicatorpastWeatherIndicatorprecipTimeprecipDepthsnowDepthstationNamecountryOrRegionp_kyeardayversionmonth
726163547702019-06-30T21:38:00.000+000042.805-72.004317.0null2.617.2nullnull61null1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T21:52:00.000+000042.805-72.004317.0null1.517.21008.6nullnullnull1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T22:52:00.000+000042.805-72.004317.0null2.118.91008.8CLRnullnull1.00.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T23:52:00.000+000042.805-72.004317.0null1.518.31009.1FEWnullnull6.094.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
703260255032019-06-15T07:54:00.000+000058.683-156.65615.0704.110.01005.6null61null1.00.0nullKING SALMON AIRPORTUS703260-255032019151.06
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [25]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
In [26]:
# Display top 5 rows
display(isd_df.limit(5))
Out[26]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Urban Heat Islands

From the Urban Innovation Initiative at Microsoft Research, data processing and analytics scripts for hourly NOAA weather station data that produce daily urban heat island indices for hundreds of U.S. cities, January 1, 2008 - present, including automated daily updating. Urban heat island effects are then examined over time and across cities, as well as aligned with population density.