略過導覽

NOAA Integrated Surface Data (ISD)

Weather ISD NOAA

來自美國國家海洋暨大氣總署 (NOAA) 的全球每小時天氣預報歷史資料 (例如:氣溫、降雨、風)。

整合地表 (ISD) 資料集 (ISD) 由來自 35,000 多個氣象站的全球地表天氣觀測所組成,儘管在北美洲、歐洲、澳洲和亞洲部分地區可明顯看出最佳的空間涵蓋範圍。 包含的參數包括、空氣品質、大氣壓力、大氣溫度/露點、大氣環流、雲、降雨、海浪、潮汐等。 ISD 意指是數位資料庫中包含的資料,以及每小時、概要(3 個每小時) 和每日天氣觀測的儲存格式。

磁碟區與保留期

此資料集以 Parquet 格式儲存, 它每天更新一次,到 2019 年為止總共包含約 4 億個資料列 (20 GB)。

此資料集包含從 2008 年累積至今的歷史記錄。 在我們的 SDK 中,您可以使用參數設定來擷取特定時間範圍內的資料。

儲存位置

此資料集儲存於美國東部 Azure 區域。 建議您在美國東部配置計算資源,以確保同質性。

其他資訊

該資料集來自 NOAA 整合地表資料庫。 如需此資料集的其他資訊,請參閱這裡這裡。 如果您對資料來源有任何疑問,請傳送電子郵件至

通知

Microsoft 係依「現況」提供 Azure 開放資料集。 針對 貴用戶對資料集的使用,Microsoft 不提供任何明示或默示的擔保、保證或條件。 在 貴用戶當地法律允許的範圍內,針對因使用資料集而導致的任何直接性、衍生性、特殊性、間接性、附隨性或懲罰性損害或損失,Microsoft 概不承擔任何責任。

此資料集是根據 Microsoft 接收來源資料的原始條款所提供。 資料集可能包含源自 Microsoft 的資料。

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

usaf wban datetime latitude longitude elevation cloudCoverage stationName countryOrRegion p_k year day version month
479710 42402 1/25/2021 2:59:00 PM 27.1 142.183 8 null CHICHIJIMA JA 479710-42402 2021 25 1 1
476420 43313 1/25/2021 2:59:00 PM 35.748 139.348 141 null YOKOTA AB JA 476420-43313 2021 25 1 1
912460 41606 1/25/2021 11:59:00 AM 19.282 166.636 4 null WAKE ISLAND AIRFIELD WQ 912460-41606 2021 25 1 1
912850 21504 1/25/2021 9:59:00 AM 19.721 -155.048 12 null HILO INTERNATIONAL AIRPORT US 912850-21504 2021 25 1 1
911820 22521 1/25/2021 9:59:00 AM 21.317 -157.917 4 null HONOLULU INTERNATIONAL AIRPORT US 911820-22521 2021 25 1 1
704540 25704 1/25/2021 9:59:00 AM 51.878 -176.646 5 null ADAK NAS US 704540-25704 2021 25 1 1
702700 00489 1/25/2021 8:59:00 AM 61.266 -149.653 115 null BRYANT ARMY AIRFIELD HELIPORT US 702700-00489 2021 25 1 1
702710 26425 1/25/2021 8:59:00 AM 62.155 -145.457 482 null GULKANA AIRPORT US 702710-26425 2021 25 1 1
703610 25339 1/25/2021 8:59:00 AM 59.503 -139.65 10 null YAKUTAT AIRPORT US 703610-25339 2021 25 1 1
701719 00490 1/25/2021 8:59:00 AM 66.888 -157.162 60 null SHUNGNAK AIRPORT US 701719-00490 2021 25 1 1
Name Data type Unique Values (sample) Description
cloudCoverage string 8 CLR
OVC

由所有可見雲層覆蓋的天空部分。 雲層覆蓋值:

CLR = Clear skies FEW = Few clouds SCT = Scattered clouds BKN = Broken cloud cover OVC = Overcast OBS = Sky is obscured/can't be estimated POBS = Sky is partially obscured
countryOrRegion string 245 US
CA

國家或地區的代碼。

datetime timestamp 6,872,496 2019-03-06 12:00:00
2019-12-16 12:00:00

GEOPHYSICAL-POINT-OBSERVATION 的 UTC 日期時間。

day int 31 1
6

欄位日期時間的日期。

elevation double 2,368 5.0
3.0

相對於平均海平面 (MSL) 的 GEOPHYSICAL-POINT-OBSERVATION 海拔。

latitude double 34,813 38.544
31.78

南半球為負數的 GEOPHYSICAL-POINT-OBSERVATION 緯度座標。

longitude double 58,125 -86.0
-96.622

從 000000 到 179999 以西之值的 GEOPHYSICAL-POINT-OBSERVATION 經度座標為負數。

month int 12 1
12

欄位日期時間的月份。

p_k string 17,409 999999-04127
999999-63831

usaf-wban

pastWeatherIndicator int 11 2
6

擷取先前的天氣指示器,顯示前一小時的天氣

0: Cloud covering 1/2 or less of the sky throughout the appropriate period 1: Cloud covering more than 1/2 of the sky during part of the appropriate period and covering 1/2 or less during part of the period 2: Cloud covering more than 1/2 of the sky throughout the appropriate period 3: Sandstorm, duststorm or blowing snow 4: Fog or ice fog or thick haze 5: Drizzle 6: Rain 7: Snow, or rain and snow mixed 8: Shower(s) 9: Thunderstorm(s) with or without precipitation
precipDepth double 5,667 9999.0
3.0

觀測時測量到的 LIQUID-PRECIPITATION 深度。 單位:公釐。 最小值:0000;最大值:9998;9999 = 遺漏;縮放係數:10.

precipTime double 44 1.0
24.0

測量 LIQUID-PRECIPITATION 經過的時間量。 單位:小時。 最小值:00;最大值:98;99 = 遺失。

presentWeatherIndicator int 101 10
5

擷取目前的天氣指示器,顯示目前小時的天氣

00: Cloud development not observed or not observable 01: Clouds generally dissolving or becoming less developed 02: State of sky on the whole unchanged 03: Clouds generally forming or developing 04: Visibility reduced by smoke, e.g. veldt or forest fires, industrial smoke or volcanic ashes 05: Haze 06: Widespread dust in suspension in the air, not raised by wind at or near the station at the time of observation 07: Dust or sand raised by wind at or near the station at the time of observation, but no well-developed dust whirl(s) sand whirl(s), and no duststorm or sandstorm seen or, in the case of ships, blowing spray at the station 08: Well developed dust whirl(s) or sand whirl(s) seen at or near the station during the preceding hour or at the time of observation, but no duststorm or sandstorm 09: Duststorm or sandstorm within sight at the time of observation, or at the station during the preceding hour For more: The section 'MW1' in ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-format-document.pdf
seaLvlPressure double 2,214 1015.0
1014.2

相對於平均海平面 (MSL) 的氣壓。

最小值:08600 最大值:10900 單位:百帕

snowDepth double 652 1.0
3.0

地面上的冰雪厚度。 最小值:0000 最大值:1200 單位:公分

stationName string 16,670 MURPHY 10 W
NEWTON 5 ENE

氣象站的名稱。

temperature double 1,467 15.0
13.0

氣溫。 最小值:-0932 最大值:+0618 單位:攝氏溫度

usaf string 16,726 999999
062350

空軍目錄站編號。

version double 1 1.0
wban string 2,556 99999
04127

NCDC WBAN 號碼。

windAngle int 362 180
270

正北方與風向之間的角度 (順時針測量)。 最小值:001 最大值:360 單位:角度

windSpeed double 620 2.1
1.5

空氣流經固定點的水平移動速率。

最小值:0000 最大值:0900 單位:每秒公尺數

year int 14 2019
2020

資料行日期時間的年份。

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)

# Get historical weather data in the past month.
isd = NoaaIsdWeather(start_date, end_date)
# Read into Pandas data frame.
isd_df = isd.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Target paths: ['/year=2019/month=6/'] Looking for parquet files... Reading them into Pandas dataframe... Reading ISDWeather/year=2019/month=6/part-00049-tid-7654660707407597606-ec55d6c6-0d34-4a97-b2c8-d201080c9a98-89240.c000.snappy.parquet under container isdweatherdatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=116905.15 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=116907.63 [ms]
In [2]:
isd_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7790719 entries, 2709 to 11337856 Data columns (total 22 columns): usaf object wban object datetime datetime64[ns] latitude float64 longitude float64 elevation float64 windAngle float64 windSpeed float64 temperature float64 seaLvlPressure float64 cloudCoverage object presentWeatherIndicator float64 pastWeatherIndicator float64 precipTime float64 precipDepth float64 snowDepth float64 stationName object countryOrRegion object p_k object year int32 day int32 version float64 dtypes: datetime64[ns](1), float64(13), int32(2), object(6) memory usage: 1.3+ GB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "isdweatherdatacontainer"
folder_name = "ISDWeather/"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=87171.59 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=87176.63 [ms]
In [2]:
display(isd_df.limit(5))
usafwbandatetimelatitudelongitudeelevationwindAnglewindSpeedtemperatureseaLvlPressurecloudCoveragepresentWeatherIndicatorpastWeatherIndicatorprecipTimeprecipDepthsnowDepthstationNamecountryOrRegionp_kyeardayversionmonth
726163547702019-06-30T21:38:00.000+000042.805-72.004317.0null2.617.2nullnull61null1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T21:52:00.000+000042.805-72.004317.0null1.517.21008.6nullnullnull1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T22:52:00.000+000042.805-72.004317.0null2.118.91008.8CLRnullnull1.00.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T23:52:00.000+000042.805-72.004317.0null1.518.31009.1FEWnullnull6.094.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
703260255032019-06-15T07:54:00.000+000058.683-156.65615.0704.110.01005.6null61null1.00.0nullKING SALMON AIRPORTUS703260-255032019151.06
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [25]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
In [26]:
# Display top 5 rows
display(isd_df.limit(5))
Out[26]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Urban Heat Islands

From the Urban Innovation Initiative at Microsoft Research, data processing and analytics scripts for hourly NOAA weather station data that produce daily urban heat island indices for hundreds of U.S. cities, January 1, 2008 - present, including automated daily updating. Urban heat island effects are then examined over time and across cities, as well as aligned with population density.