跳过导航

Harmonized Landsat Sentinel-2

SatelliteImagery EarthObservation AIforEarth NASA ESA

Landsat-8 和 Sentinel-2 卫星提供的北美卫星影像。

Harmonized Landsat Sentinel-2 (HLS) 产品包括来自 Landsat-8 和 Sentinel-2 卫星的数据(与分辨率为 30m 的通用磁贴系统对齐),该数据面向 2013 年至今的 Landsat 以及 2015 年至今的 Sentinel-2。 HLS 由美国国家航空航天局 (NASA) 管理。

此数据集由 Ag-Analytics® 维护。 Ag-Analytics® 还提供 API,该 API 接受关注区域 (AOI) 多边形、日期范围和其他选项,并返回用于各 MSI 段的已处理图像、归一化差值植被指数和其他指标,以及经过云筛选的马赛克。

该数据集每周更新。

存储资源

数据存储在美国东部 2 数据中心的 blob 的以下 blob 容器中:

https://hlssa.blob.core.windows.net/hls

在该容器内,数据构成方式如下:

<folder>/HLS.<product>.T<tileid>.<daynum>.<version>_<subdataset>.tif

  • 对于 Landsat,folderL309;对于 Sentinel-2,它则为 S309
  • 对于 Landsat,productL30;对于 Sentinel-2,它则为 S30
  • tileidSentinel-2 磁贴系统中的四字符磁贴代码
  • daynum 是四位数年份加上三位数日期(从 001 到 365),例如 2019001 表示 2019 年 1 月 1 日
  • version 始终是 v1.4
  • subdataset 是双字符、以 1 为索引的字符串,表示子数据集(如下)

可在此处查找到“纬度/经度”到“磁贴 ID”的映射;“数据访问”下的笔记本演示如何使用此表根据“纬度/经度”查找磁贴 ID。 还可使用 Ag-Analytics® API 查找磁贴 ID。

数据用于以下主磁贴:

[‘10 U’,‘11 U’,‘12 U’,‘13 U’,‘14 U’,‘15 U’,‘16 U’,‘10 T’,‘11 T’,‘12 T’,‘13 T’,‘14 T’,‘15 T’,‘16 T’,‘17 T’,‘18 T’,‘19 T’,‘10 S’,‘11 S’,‘12 S’,‘13 S’,‘14 S’,‘15 S’,‘16 S’,‘17 S’,‘18 S’,‘12 R’,‘13 R’,‘14 R’,‘15 R’,‘16 R’,‘17 R’]

段如下所示:

段名称 OLI 段号 MSI 段号 L30 子数据集号 S30 子数据集号
沿海悬浮微粒 1 1 01 01
蓝色 2 2 02 02
绿色 3 3 03 03
Red 4 4 04 04
Red-edge 1 5 05
Red-edge 2 6 06
Red-edge 3 7 07
NIR broad 8 08
NIR narrow 5 8A 05 09
SWIR 1 6 11 06 10
SWIR 2 7 12 07 11
Water vapor 9 12
Cirrus 9 10 08 13
Thermal infrared 1 10 09
Thermal infrared 2 11 10
QA 11 14

例如,以下文件名 HLS.S30.T16TDL.2019206.v1.4_01.tif 位于 https://hlssa.blob.core.windows.net/hls/S309/HLS.S30.T16TDL.2019206.v1.4_03.tif ,并表示 2019 年第 206 天这一天数据集段 03(MSI 段 3,绿色)的磁贴 16TDL(主磁贴 16T,子磁贴 DL)的 Sentinel-2 (30) HLS 数据。

我们还提供只读 SAS(共享访问签名)令牌,以通过 BlobFuse 等访问 HLS 数据,BlobFuse 可将 blob 容器装载为驱动器:

st=2019-08-07T14%3A54%3A43Z&se=2050-08-08T14%3A54%3A00Z&sp=rl&sv=2018-03-28&sr=c&sig=EYNJCexDl5yxb1TxNH%2FzILznc3TiAnJq%2FPvCumkuV5U%3D

此处提供适用于 Linux 的装载说明。

HLS 数据可消耗数百兆兆字节,因此最好在图像存储地美国东部 2 Azure 数据中心执行大规模处理。 如果将 HLS 数据用于环境科学应用,请考虑申请 AI for Earth 许可以满足计算需求。

联系人

若有关于此数据集的任何疑问,请联系 aiforearthdatasets@microsoft.com

通知

Microsoft 以“原样”为基础提供 AZURE 开放数据集。 Microsoft 对数据集的使用不提供任何担保(明示或暗示)、保证或条件。 在当地法律允许的范围内,Microsoft 对使用数据集而导致的任何损害或损失不承担任何责任,包括直接、必然、特殊、间接、偶发或惩罚。

此数据集是根据 Microsoft 接收源数据的原始条款提供的。 数据集可能包含来自 Microsoft 的数据。

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Select your preferred service:

Azure Notebooks

Azure Notebooks

Package: Language: Python

Demo notebook for accessing HLS data on Azure

This notebook provides an example of accessing HLS (Harmonized Landsat Sentinel-2) data from blob storage on Azure, extracting image metadata using GDAL, and displaying an image using GDAL and rasterio.

HLS data are stored in the East US 2 data center, so this notebook will run more efficiently on the Azure compute located in East US 2. You don't want to download hundreds of terabytes to your laptop! If you are using HLS data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.

HLS data on Azure are managed by Ag-Analytics. Ag-Analytics also provides an API which allows the caller to query to perform spatial queries over the HLS archive, as well as querying for additional data such as cloud cover and Normalized Difference Vegetation Index (NDVI). Ag-Analytics also provides an API to retrieve tile IDs matching spatial queries.

Imports and environment

In [2]:
# Standard-ish packages
import requests
import re
import numpy as np
import urllib
import io
import matplotlib.pyplot as plt
import pandas as pd

# Less standard, but all of the following are pip- or conda-installable
import rasterio

# pip install azure-storage-blob
from azure.storage.blob import ContainerClient

from osgeo import gdal,osr

# Storage locations are documented at http://aka.ms/ai4edata-hls
hls_container_name = 'hls'
hls_account_name = 'hlssa'
hls_account_url = 'https://' + hls_account_name + '.blob.core.windows.net/'
hls_blob_root = hls_account_url + hls_container_name

# This file is provided by NASA; it indicates the lat/lon extents of each
# hls tile.
#
# The file originally comes from:
#
# https://hls.gsfc.nasa.gov/wp-content/uploads/2016/10/S2_TilingSystem2-1.txt
#
# ...but as of 8/2019, there is a bug with the column names in the original file, so we
# access a copy with corrected column names.
hls_tile_extents_url = 'https://ai4edatasetspublicassets.blob.core.windows.net/assets/S2_TilingSystem2-1.txt?st=2019-08-23T03%3A25%3A57Z&se=2028-08-24T03%3A25%3A00Z&sp=rl&sv=2018-03-28&sr=b&sig=KHNZHIJuVG2KqwpnlsJ8truIT5saih8KrVj3f45ABKY%3D'

# Load this file into a table, where each row is:
#
# Tile ID, Xstart, Ystart, UZ, EPSG, MinLon, MaxLon, MinLon, MaxLon
print('Reading tile extents...')
s = requests.get(hls_tile_extents_url).content
hls_tile_extents = pd.read_csv(io.StringIO(s.decode('utf-8')),delimiter=r'\s+')
print('Read tile extents for {} tiles'.format(len(hls_tile_extents)))

# Read-only shared access signature (SAS) URL for the hls container
hls_sas_token = 'st=2019-08-07T14%3A54%3A43Z&se=2050-08-08T14%3A54%3A00Z&sp=rl&sv=2018-03-28&sr=c&sig=EYNJCexDl5yxb1TxNH%2FzILznc3TiAnJq%2FPvCumkuV5U%3D'

hls_container_client = ContainerClient(account_url=hls_account_url, 
                                         container_name=hls_container_name,
                                         credential=None)
                                

%matplotlib inline
Reading tile extents...
Read tile extents for 56686 tiles

Functions

In [3]:
def get_hls_tile(blob_url):
    """
    Given a URL pointing to an HLS image in blob storage, load that image via GDAL
    and return both data and metadata.
    """    
    
    formatted_gdal_bloburl='/{}/{}'.format('vsicurl',blob_url)
    
    tile_open = gdal.Open(formatted_gdal_bloburl)
    data = tile_open.GetRasterBand(1)
    ndv,xsize,ysize = data.GetNoDataValue(),tile_open.RasterXSize,tile_open.RasterYSize
    
    projection = osr.SpatialReference()
    projection.ImportFromWkt(tile_open.GetProjectionRef())
    
    datatype = data.DataType
    datatype = gdal.GetDataTypeName(datatype)  
    data_array = data.ReadAsArray()

    return ndv,xsize,ysize,projection,data_array


def list_available_tiles(prefix):
    """
    List all blobs in an Azure blob container matching a prefix.  
    
    We'll use this to query tiles by location and year.
    """
    
    files = []
    generator = hls_container_client.list_blobs(name_starts_with=prefix)
    for blob in generator:
        files.append(blob.name)
    return files

    
def lat_lon_to_hls_tile_id(lat,lon):
    """
    Get the hls tile ID for a given lat/lon coordinate pair
    """  
    
    found_matching_tile = False

    for i_row,row in hls_tile_extents.iterrows():
        found_matching_tile = lat >= row.MinLat and lat <= row.MaxLat \
        and lon >= row.MinLon and lon <= row.MaxLon
        if found_matching_tile:
            break
    
    if not found_matching_tile:
        return None
    else:
        return row.TilID

Find a tile for a given location and date

In [4]:
# Specify a location and year of interest
lat = 47.6101; lon = -122.2015 # Bellevue, WA

year = '2019'
daynum = '109'    # 1-indexed day-of-year
folder = 'S309'   # 'S309' for Sentinel, 'L309' for Landsat
product = 'S30'   # 'S30' for Sentinel, 'L30' for Landsat
year = '2019'

tile_id = lat_lon_to_hls_tile_id(lat,lon)
assert tile_id is not None, 'Invalid lat/lon'
prefix = folder + '/HLS.' + product + '.T' + tile_id + '.' + year

print('Finding tiles with prefix {}'.format(prefix))
matches = list_available_tiles(prefix)
assert len(matches) > 0, 'No matching tiles'

blob_name = matches[0]
print('Found {} matching tiles, using file {}'.format(len(matches),blob_name))
Finding tiles with prefix S309/HLS.S30.T10TET.2019
Found 1918 matching tiles, using file S309/HLS.S30.T10TET.2019001.v1.4_01.tif

...or build a tile path from components

In [5]:
lat = 47.6101; lon = -122.2015 # Bellevue, WA

year    = '2019'
daynum  = '001'   # 1-indexed day-of-year
folder  = 'S309'  # 'S309' for Sentinel, 'L309' for Landsat
product = 'S30'   # 'S30' for Sentinel, 'L30' for Landsat
band    = '01'
tile_id = '10TET' # See hls.gsfc.nasa.gov/wp-content/uploads/2016/10/S2_TilingSystem2-1.txt
version = 'v1.4'  # Currently always v1.4

blob_name = folder + '/HLS.' + product + '.T' + tile_id + '.' + year + daynum + '.' + version \
    + '_' + band + '.tif'

print('Using file {}'.format(blob_name))
Using file S309/HLS.S30.T10TET.2019001.v1.4_01.tif

Access one band of the selected image using GDAL's virtual file system (vsicurl)

In [6]:
gdal.SetConfigOption('GDAL_HTTP_UNSAFESSL', 'YES')
blob_url = hls_blob_root + '/' + blob_name
print('Reading tile from {}'.format(blob_url))
ndv,xsize,ysize,projection,data_array = get_hls_tile(blob_url)

print('No-data value: {}'.format(ndv))
print('\nSize: {},{}'.format(xsize,ysize))
print('\nProjection:\n{}'.format(projection))
Reading tile from https://hlssa.blob.core.windows.net/hls/S309/HLS.S30.T10TET.2019001.v1.4_01.tif
No-data value: -1000.0

Size: 3660,3660

Projection:
PROJCS["UTM Zone 10, Northern Hemisphere",
    GEOGCS["Unknown datum based upon the WGS 84 ellipsoid",
        DATUM["Not_specified_based_on_WGS_84_spheroid",
            SPHEROID["WGS 84",6378137,298.257223563,
                AUTHORITY["EPSG","7030"]]],
        PRIMEM["Greenwich",0],
        UNIT["degree",0.0174532925199433]],
    PROJECTION["Transverse_Mercator"],
    PARAMETER["latitude_of_origin",0],
    PARAMETER["central_meridian",-123],
    PARAMETER["scale_factor",0.9996],
    PARAMETER["false_easting",500000],
    PARAMETER["false_northing",0],
    UNIT["metre",1,
        AUTHORITY["EPSG","9001"]]]

Display a Sentinel-2 image using rasterio and vsicurl

In [7]:
# Bands 2, 3, and 4 are B, G, and R in Sentinel-2 HLS images

base_url = '/vsicurl/' + hls_blob_root + '/' + blob_name
band2_url = re.sub('_(\d+).tif','_02.tif',base_url)
band3_url = re.sub('_(\d+).tif','_03.tif',base_url)
band4_url = re.sub('_(\d+).tif','_04.tif',base_url)
print('Reading bands from:\n{}\n{}\n{}'.format(band2_url,band3_url,band4_url))

band2 = rasterio.open(band2_url)
band3 = rasterio.open(band3_url)
band4 = rasterio.open(band4_url)

norm_value = 2000
image_data = []
for band in [band4,band3,band2]:
    band_array = band.read(1)
    band_array = band_array / norm_value
    image_data.append(band_array)
    band.close()

rgb = np.dstack((image_data[0],image_data[1],image_data[2]))
np.clip(rgb,0,1,rgb)
plt.imshow(rgb)
Reading bands from:
/vsicurl/https://hlssa.blob.core.windows.net/hls/S309/HLS.S30.T10TET.2019001.v1.4_02.tif
/vsicurl/https://hlssa.blob.core.windows.net/hls/S309/HLS.S30.T10TET.2019001.v1.4_03.tif
/vsicurl/https://hlssa.blob.core.windows.net/hls/S309/HLS.S30.T10TET.2019001.v1.4_04.tif
Out[7]:
<matplotlib.image.AxesImage at 0x2249d8e6f48>