跳过导航

NOAA Integrated Surface Data (ISD)

Weather ISD NOAA

源自美国海洋与大气管理局 (NOAA) 的按小时记录的全球天气历史数据(例如:温度、降雨、风)。

综合地表数据集 (ISD) 由来自 35,000 多个站点的全球地表天气观测组成,但是最佳的空间覆盖范围明显位于北美、欧洲、澳大利亚和亚洲部分地区。 所含参数包括:空气质量、气压、气温/露点温度、大气风、云、降雨、海浪、潮汐等。 ISD 指的是数字数据库中包含的数据,以及每小时、天气(3 小时)和每天天气观测的存储格式。

数量和保留期

此数据集以 Parquet 格式存储。 每天更新,截至 2019 年总共包括约 4 亿行 (20 GB)。

此数据集包含从 2008 年至今累积的历史记录。 可使用我们的 SDK 中的参数设置来提取特定时间范围内的数据。

存储位置

此数据集存储在美国东部 Azure 区域。 建议将计算资源分配到美国东部地区,以实现相关性。

其他信息

此数据集源自 NOAA 综合地表数据库。 有关此数据集的其他信息,请参阅此处此处。 如果对数据源有任何疑问,请发送电子邮件至

通知

Microsoft 以“原样”为基础提供 AZURE 开放数据集。 Microsoft 对数据集的使用不提供任何担保(明示或暗示)、保证或条件。 在当地法律允许的范围内,Microsoft 对使用数据集而导致的任何损害或损失不承担任何责任,包括直接、必然、特殊、间接、偶发或惩罚。

此数据集是根据 Microsoft 接收源数据的原始条款提供的。 数据集可能包含来自 Microsoft 的数据。

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Azure Databricks

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Azure Synapse

Use this when you need the scale of an Azure managed Spark cluster to process the dataset.

Preview

usaf wban datetime latitude longitude elevation cloudCoverage stationName countryOrRegion p_k year day version month
726917 24284 9/18/2020 7:59:00 AM 43.413 -124.243 5 null NORTH BEND MUNICIPAL ARPT US 726917-24284 2020 18 1 9
720267 23224 9/18/2020 7:59:00 AM 38.955 -121.081 467 null AUBURN MUNICIPAL AIRPORT US 720267-23224 2020 18 1 9
727815 24237 9/18/2020 7:59:00 AM 47.277 -121.337 1209 null STAMPASS PASS FLTWO US 727815-24237 2020 18 1 9
727938 94274 9/18/2020 7:59:00 AM 47.268 -122.576 96 null TACOMA NARROWS AIRPORT US 727938-94274 2020 18 1 9
720272 94282 9/18/2020 7:59:00 AM 48.467 -122.416 44 null SKAGIT REGIONAL AIRPORT US 720272-94282 2020 18 1 9
720923 00310 9/18/2020 7:59:00 AM 48.726 -116.295 711 null BOUNDARY COUNTY AIRPORT US 720923-00310 2020 18 1 9
727937 24222 9/18/2020 7:59:00 AM 47.908 -122.28 185 null SNOHOMISH CO (PAINE FD) AP US 727937-24222 2020 18 1 9
722208 04224 9/18/2020 7:59:00 AM 48.708 -122.91 9 null ORCAS ISLAND AIRPORT US 722208-04224 2020 18 1 9
720646 00228 9/18/2020 7:59:00 AM 37.513 -122.501 20 null HALF MOON BAY AIRPORT US 720646-00228 2020 18 1 9
A06854 00115 9/18/2020 7:59:00 AM 34.264 -116.854 2057 null BIG BEAR CITY AIRPORT US A06854-00115 2020 18 1 9
Name Data type Unique Values (sample) Description
cloudCoverage string 8 CLR
OVC

被所有可见云覆盖的天空的部分。 云覆盖值:

CLR = Clear skies FEW = Few clouds SCT = Scattered clouds BKN = Broken cloud cover OVC = Overcast OBS = Sky is obscured/can't be estimated POBS = Sky is partially obscured
countryOrRegion string 245 US
CA

国家或地区代码。

datetime timestamp 6,683,044 2019-01-13 12:00:00
2020-02-15 12:00:00

GEOPHYSICAL-POINT-OBSERVATION(地球物理点观测)的 UTC 日期/时间。

day int 31 1
6

列日期/时间的日期。

elevation double 2,365 5.0
3.0

相对于平均海平面 (MSL),GEOPHYSICAL-POINT-OBSERVATION(地球物理观测点)的海拔。

latitude double 34,717 38.544
31.78

GEOPHYSICAL-POINT-OBSERVATION(地球物理点观测)的纬度坐标,此位置处南半球为负。

longitude double 58,037 -86.0
-96.622

GEOPHYSICAL-POINT-OBSERVATION(地球物理观测点)的经度坐标,从 000000 到 179999 以西的值为负。

month int 12 1
3

列日期/时间的月份。

p_k string 17,300 999999-54797
999999-94644

usaf-wban

pastWeatherIndicator int 11 2
6

检索过去天气指示器,该指示器显示过去一小时内的天气状况

0: Cloud covering 1/2 or less of the sky throughout the appropriate period 1: Cloud covering more than 1/2 of the sky during part of the appropriate period and covering 1/2 or less during part of the period 2: Cloud covering more than 1/2 of the sky throughout the appropriate period 3: Sandstorm, duststorm or blowing snow 4: Fog or ice fog or thick haze 5: Drizzle 6: Rain 7: Snow, or rain and snow mixed 8: Shower(s) 9: Thunderstorm(s) with or without precipitation
precipDepth double 5,635 9999.0
3.0

在观测时测量的降雨量的深度。 单位:毫米。 最小值:0000;最大值:9998;9999 = 缺失;比例因子:10.

precipTime double 44 1.0
24.0

测量降雨量的时间段。 单位:小时。 最小值:00;最大值:98;99 = 缺失。

presentWeatherIndicator int 101 10
5

检索现在天气指示器,该指示器显示现在一小时内的天气状况

00: Cloud development not observed or not observable 01: Clouds generally dissolving or becoming less developed 02: State of sky on the whole unchanged 03: Clouds generally forming or developing 04: Visibility reduced by smoke, e.g. veldt or forest fires, industrial smoke or volcanic ashes 05: Haze 06: Widespread dust in suspension in the air, not raised by wind at or near the station at the time of observation 07: Dust or sand raised by wind at or near the station at the time of observation, but no well-developed dust whirl(s) sand whirl(s), and no duststorm or sandstorm seen or, in the case of ships, blowing spray at the station 08: Well developed dust whirl(s) or sand whirl(s) seen at or near the station during the preceding hour or at the time of observation, but no duststorm or sandstorm 09: Duststorm or sandstorm within sight at the time of observation, or at the station during the preceding hour For more: The section 'MW1' in ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-format-document.pdf
seaLvlPressure double 2,214 1015.0
1014.2

相对于平均海平面 (MSL) 的气压。

最小值:08600 最大值:10900 单位:百帕

snowDepth double 652 1.0
3.0

地面上的冰雪深度。 最小值:0000 最大值:1200 单位:厘米

stationName string 16,575 KINGSTON 1 W
OLD TOWN 2 W

气象站的名称。

temperature double 1,467 15.0
13.0

空气温度。 最小值:-0932 最大值:+0618 单位:摄氏温度

usaf string 16,617 999999
062350

空军目录站号。

version double 1 1.0
wban string 2,555 99999
54797

NCDC WBAN 号。

windAngle int 362 180
270

在正北和风的来向之间沿顺时针方向测量的角度。 最小值:001 最大值:360 单位:角度

windSpeed double 617 2.1
1.5

空气过定点的水平移动速度。

最小值:0000 最大值:0900 单位:米/秒

year int 13 2019
2018

列日期/时间的年份。

Select your preferred service:

Azure Notebooks

Azure Databricks

Azure Synapse

Azure Notebooks

Package: Language: Python Python
In [1]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)

# Get historical weather data in the past month.
isd = NoaaIsdWeather(start_date, end_date)
# Read into Pandas data frame.
isd_df = isd.to_pandas_dataframe()
ActivityStarted, to_pandas_dataframe ActivityStarted, to_pandas_dataframe_in_worker Target paths: ['/year=2019/month=6/'] Looking for parquet files... Reading them into Pandas dataframe... Reading ISDWeather/year=2019/month=6/part-00049-tid-7654660707407597606-ec55d6c6-0d34-4a97-b2c8-d201080c9a98-89240.c000.snappy.parquet under container isdweatherdatacontainer Done. ActivityCompleted: Activity=to_pandas_dataframe_in_worker, HowEnded=Success, Duration=116905.15 [ms] ActivityCompleted: Activity=to_pandas_dataframe, HowEnded=Success, Duration=116907.63 [ms]
In [2]:
isd_df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 7790719 entries, 2709 to 11337856 Data columns (total 22 columns): usaf object wban object datetime datetime64[ns] latitude float64 longitude float64 elevation float64 windAngle float64 windSpeed float64 temperature float64 seaLvlPressure float64 cloudCoverage object presentWeatherIndicator float64 pastWeatherIndicator float64 precipTime float64 precipDepth float64 snowDepth float64 stationName object countryOrRegion object p_k object year int32 day int32 version float64 dtypes: datetime64[ns](1), float64(13), int32(2), object(6) memory usage: 1.3+ GB
In [1]:
# Pip install packages
import os, sys

!{sys.executable} -m pip install azure-storage-blob
!{sys.executable} -m pip install pyarrow
!{sys.executable} -m pip install pandas
In [2]:
# Azure storage access info
azure_storage_account_name = "azureopendatastorage"
azure_storage_sas_token = r""
container_name = "isdweatherdatacontainer"
folder_name = "ISDWeather/"
In [3]:
from azure.storage.blob import BlockBlobServicefrom azure.storage.blob import BlobServiceClient, BlobClient, ContainerClient

if azure_storage_account_name is None or azure_storage_sas_token is None:
    raise Exception(
        "Provide your specific name and key for your Azure Storage account--see the Prerequisites section earlier.")

print('Looking for the first parquet under the folder ' +
      folder_name + ' in container "' + container_name + '"...')
container_url = f"https://{azure_storage_account_name}.blob.core.windows.net/"
blob_service_client = BlobServiceClient(
    container_url, azure_storage_sas_token if azure_storage_sas_token else None)

container_client = blob_service_client.get_container_client(container_name)
blobs = container_client.list_blobs(folder_name)
sorted_blobs = sorted(list(blobs), key=lambda e: e.name, reverse=True)
targetBlobName = ''
for blob in sorted_blobs:
    if blob.name.startswith(folder_name) and blob.name.endswith('.parquet'):
        targetBlobName = blob.name
        break

print('Target blob to download: ' + targetBlobName)
_, filename = os.path.split(targetBlobName)
blob_client = container_client.get_blob_client(targetBlobName)
with open(filename, 'wb') as local_file:
    blob_client.download_blob().download_to_stream(local_file)
In [4]:
# Read the parquet file into Pandas data frame
import pandas as pd

print('Reading the parquet file into Pandas data frame')
df = pd.read_parquet(filename)
In [5]:
# you can add your filter at below
print('Loaded as a Pandas data frame: ')
df
In [6]:
 

Azure Databricks

Package: Language: Python Python
In [1]:
# This is a package in preview.
# You need to pip install azureml-opendatasets in Databricks cluster. https://docs.microsoft.com/en-us/azure/data-explorer/connect-from-databricks#install-the-python-library-on-your-azure-databricks-cluster
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
ActivityStarted, to_spark_dataframe ActivityStarted, to_spark_dataframe_in_worker ActivityCompleted: Activity=to_spark_dataframe_in_worker, HowEnded=Success, Duration=87171.59 [ms] ActivityCompleted: Activity=to_spark_dataframe, HowEnded=Success, Duration=87176.63 [ms]
In [2]:
display(isd_df.limit(5))
usafwbandatetimelatitudelongitudeelevationwindAnglewindSpeedtemperatureseaLvlPressurecloudCoveragepresentWeatherIndicatorpastWeatherIndicatorprecipTimeprecipDepthsnowDepthstationNamecountryOrRegionp_kyeardayversionmonth
726163547702019-06-30T21:38:00.000+000042.805-72.004317.0null2.617.2nullnull61null1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T21:52:00.000+000042.805-72.004317.0null1.517.21008.6nullnullnull1.043.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T22:52:00.000+000042.805-72.004317.0null2.118.91008.8CLRnullnull1.00.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
726163547702019-06-30T23:52:00.000+000042.805-72.004317.0null1.518.31009.1FEWnullnull6.094.0nullJAFFREY MINI-SLVR RNCH APTUS726163-547702019301.06
703260255032019-06-15T07:54:00.000+000058.683-156.65615.0704.110.01005.6null61null1.00.0nullKING SALMON AIRPORTUS703260-255032019151.06
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Azure Synapse

Package: Language: Python Python
In [25]:
# This is a package in preview.
from azureml.opendatasets import NoaaIsdWeather

from datetime import datetime
from dateutil.relativedelta import relativedelta


end_date = datetime.today()
start_date = datetime.today() - relativedelta(months=1)
isd = NoaaIsdWeather(start_date, end_date)
isd_df = isd.to_spark_dataframe()
In [26]:
# Display top 5 rows
display(isd_df.limit(5))
Out[26]:
In [1]:
# Azure storage access info
blob_account_name = "azureopendatastorage"
blob_container_name = "isdweatherdatacontainer"
blob_relative_path = "ISDWeather/"
blob_sas_token = r""
In [2]:
# Allow SPARK to read from Blob remotely
wasbs_path = 'wasbs://%s@%s.blob.core.windows.net/%s' % (blob_container_name, blob_account_name, blob_relative_path)
spark.conf.set(
  'fs.azure.sas.%s.%s.blob.core.windows.net' % (blob_container_name, blob_account_name),
  blob_sas_token)
print('Remote blob path: ' + wasbs_path)
In [3]:
# SPARK read parquet, note that it won't load any data yet by now
df = spark.read.parquet(wasbs_path)
print('Register the DataFrame as a SQL temporary view: source')
df.createOrReplaceTempView('source')
In [4]:
# Display top 10 rows
print('Displaying top 10 rows: ')
display(spark.sql('SELECT * FROM source LIMIT 10'))

Urban Heat Islands

From the Urban Innovation Initiative at Microsoft Research, data processing and analytics scripts for hourly NOAA weather station data that produce daily urban heat island indices for hundreds of U.S. cities, January 1, 2008 - present, including automated daily updating. Urban heat island effects are then examined over time and across cities, as well as aligned with population density.