Harmonized Landsat Sentinel-2
Satellite imagery from the Landsat 8 and Sentinel-2 satellites, aligned to a common grid and processed to compatible color spaces.
The Harmonized Landsat Sentinel-2 (HLS) product includes data from the Landsat-8 and Sentinel-2 satellites, aligned to a common tiling system at 30m resolution, from 2013 to the present for Landsat and 2015 to the present for Sentinel-2. HLS is administered by the National Aeronautics and Space Administration (NASA).
This dataset is maintained by Ag-Analytics®. Ag-Analytics® also provides an API which accepts an area of interest (AOI) polygon, date range, and other options, and returns processed images for individual MSI bands as well as Normalized Difference Vegetation Index and other metrics, as well as cloud-filtered mosaics.
This dataset is updated weekly.
Storage resources
Data are stored as cloud-optimized GeoTIFF files in the East US 2 data center, in the following blob container:
https://hlssa.blob.core.windows.net/hls
Within that container, data are organized according to:
<folder>/HLS.<product>.T<tileid>.<daynum>.<version>_<subdataset>.tif
folder
isL30
for Landsat,S30
for Sentinel-2product
isL30
for Landsat,S30
for Sentinel-2tileid
is a four-character tile code from the Sentinel-2 tiling systemdaynum
is a four-digit year plus a three-digit day of year (from 001 to 365), e.g. 2019001 represents January 1, 2019version
is alwaysv1.4
subdataset
is a two-character, 1-indexed string indicating a subdataset (see below)
A mapping from lat/lon to tile IDs can be found here; the notebook provided under “data access” demonstrates the use of this table to look up a tile ID by lat/lon. Tile IDs can also be found using the Ag-Analytics® API.
Data are provided for the United States, northern Mexico, southern Canada, France, Ireland, Germany, Ukraine, South Africa, and southeastern Australia; see the HLS coverage map for coverage areas.
Bands are as follows:
Band name | OLI band number | MSI band number | L30 subdatasetnumber | S30 subdatasetnumber |
Coastal aerosol | 1 | 1 | 01 | 01 |
Blue | 2 | 2 | 02 | 02 |
Green | 3 | 3 | 03 | 03 |
Red | 4 | 4 | 04 | 04 |
Red-edge 1 | 5 | 05 | ||
Red-edge 2 | 6 | 06 | ||
Red-edge 3 | 7 | 07 | ||
NIR broad | 8 | 08 | ||
NIR narrow | 5 | 8A | 05 | 09 |
SWIR 1 | 6 | 11 | 06 | 10 |
SWIR 2 | 7 | 12 | 07 | 11 |
Water vapor | 9 | 12 | ||
Cirrus | 9 | 10 | 08 | 13 |
Thermal infrared 1 | 10 | 09 | ||
Thermal infrared 2 | 11 | 10 | ||
QA | 11 | 14 |
For example the following filename, HLS.S30.T16TDL.2019206.v1.4_01.tif would be located at https://hlssa.blob.core.windows.net/hls/S30/HLS.S30.T16TDL.2019206.v1.4_03.tif and would represent Sentinel-2 (S30) HLS data for tile 16TDL (primary tile 16T, sub-tile DL) for dataset band 03 (MSI Band 3, Green) for the 206th day of 2019.
We also provide a read-only SAS (shared access signature) token to allow access to HLS data via, e.g., BlobFuse, which allows you to mount blob containers as drives:
?sv=2019-12-12&si=hls-ro&sr=c&sig=g5Pe8pV1%2Fo6ZttXhcnAz66ufwkeGBmwyc2PgnLirl4w%3D
Mounting instructions for Linux are here.
HLS data can consume hundreds of terabytes, so large-scale processing is best performed in the East US 2 Azure data center where the images are stored. If you are using HLS data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.
Contact
For questions about this dataset, contact aiforearthdatasets@microsoft.com
.
Notices
MICROSOFT PROVIDES AZURE OPEN DATASETS ON AN “AS IS” BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INCLUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.
This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.
Access
Available in | When to use |
---|---|
Azure Notebooks | Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine. |
Select your preferred service:
Azure Notebooks
# Standard-ish packages
import requests
import re
import numpy as np
import urllib
import io
import matplotlib.pyplot as plt
import pandas as pd
# Less standard, but all of the following are pip- or conda-installable
import rasterio
# pip install azure-storage-blob
from azure.storage.blob import ContainerClient
from osgeo import gdal,osr
# Storage locations are documented at http://aka.ms/ai4edata-hls
hls_container_name = 'hls'
hls_account_name = 'hlssa'
hls_account_url = 'https://' + hls_account_name + '.blob.core.windows.net/'
hls_blob_root = hls_account_url + hls_container_name
# This file is provided by NASA; it indicates the lat/lon extents of each
# hls tile.
#
# The file originally comes from:
#
# https://hls.gsfc.nasa.gov/wp-content/uploads/2016/10/S2_TilingSystem2-1.txt
#
# ...but as of 8/2019, there is a bug with the column names in the original file, so we
# access a copy with corrected column names.
hls_tile_extents_url = 'https://ai4edatasetspublicassets.blob.core.windows.net/assets/S2_TilingSystem2-1.txt'
# Load this file into a table, where each row is:
#
# Tile ID, Xstart, Ystart, UZ, EPSG, MinLon, MaxLon, MinLon, MaxLon
print('Reading tile extents...')
s = requests.get(hls_tile_extents_url).content
hls_tile_extents = pd.read_csv(io.StringIO(s.decode('utf-8')),delimiter=r'\s+')
print('Read tile extents for {} tiles'.format(len(hls_tile_extents)))
hls_container_client = ContainerClient(account_url=hls_account_url,
container_name=hls_container_name,
credential=None)
%matplotlib inline
def get_hls_tile(blob_url):
"""
Given a URL pointing to an HLS image in blob storage, load that image via GDAL
and return both data and metadata.
"""
formatted_gdal_bloburl='/{}/{}'.format('vsicurl',blob_url)
tile_open = gdal.Open(formatted_gdal_bloburl)
data = tile_open.GetRasterBand(1)
ndv,xsize,ysize = data.GetNoDataValue(),tile_open.RasterXSize,tile_open.RasterYSize
projection = osr.SpatialReference()
projection.ImportFromWkt(tile_open.GetProjectionRef())
datatype = data.DataType
datatype = gdal.GetDataTypeName(datatype)
data_array = data.ReadAsArray()
return ndv,xsize,ysize,projection,data_array
def list_available_tiles(prefix):
"""
List all blobs in an Azure blob container matching a prefix.
We'll use this to query tiles by location and year.
"""
files = []
generator = hls_container_client.list_blobs(name_starts_with=prefix)
for blob in generator:
files.append(blob.name)
return files
def lat_lon_to_hls_tile_id(lat,lon):
"""
Get the hls tile ID for a given lat/lon coordinate pair.
"""
found_matching_tile = False
for i_row,row in hls_tile_extents.iterrows():
found_matching_tile = lat >= row.MinLat and lat <= row.MaxLat \
and lon >= row.MinLon and lon <= row.MaxLon
if found_matching_tile:
break
if not found_matching_tile:
return None
else:
return row.TilID
# Specify a location and year of interest
lat = 47.6101; lon = -122.2015 # Bellevue, WA
year = '2019'
daynum = '109' # 1-indexed day-of-year
folder = 'S309' # 'S309' for Sentinel, 'L309' for Landsat
product = 'S30' # 'S30' for Sentinel, 'L30' for Landsat
year = '2019'
tile_id = lat_lon_to_hls_tile_id(lat,lon)
assert tile_id is not None, 'Invalid lat/lon'
prefix = folder + '/HLS.' + product + '.T' + tile_id + '.' + year
print('Finding tiles with prefix {}'.format(prefix))
matches = list_available_tiles(prefix)
assert len(matches) > 0, 'No matching tiles'
blob_name = matches[0]
print('Found {} matching tiles, using file {}'.format(len(matches),blob_name))
lat = 47.6101; lon = -122.2015 # Bellevue, WA
year = '2019'
daynum = '001' # 1-indexed day-of-year
folder = 'S30' # 'S30' for Sentinel, 'L30' for Landsat
product = 'S30' # 'S30' for Sentinel, 'L30' for Landsat
band = '01'
tile_id = '10TET' # See hls.gsfc.nasa.gov/wp-content/uploads/2016/10/S2_TilingSystem2-1.txt
version = 'v1.4' # Currently always v1.4
blob_name = folder + '/HLS.' + product + '.T' + tile_id + '.' + year + daynum + '.' + version \
+ '_' + band + '.tif'
print('Using file {}'.format(blob_name))
gdal.SetConfigOption('GDAL_HTTP_UNSAFESSL', 'YES')
blob_url = hls_blob_root + '/' + blob_name
print('Reading tile from {}'.format(blob_url))
ndv,xsize,ysize,projection,data_array = get_hls_tile(blob_url)
print('No-data value: {}'.format(ndv))
print('\nSize: {},{}'.format(xsize,ysize))
print('\nProjection:\n{}'.format(projection))
# Bands 2, 3, and 4 are B, G, and R in Sentinel-2 HLS images
base_url = '/vsicurl/' + hls_blob_root + '/' + blob_name
band2_url = re.sub('_(\d+).tif','_02.tif',base_url)
band3_url = re.sub('_(\d+).tif','_03.tif',base_url)
band4_url = re.sub('_(\d+).tif','_04.tif',base_url)
print('Reading bands from:\n{}\n{}\n{}'.format(band2_url,band3_url,band4_url))
band2 = rasterio.open(band2_url)
band3 = rasterio.open(band3_url)
band4 = rasterio.open(band4_url)
norm_value = 2000
image_data = []
for band in [band4,band3,band2]:
band_array = band.read(1)
band_array = band_array / norm_value
image_data.append(band_array)
band.close()
rgb = np.dstack((image_data[0],image_data[1],image_data[2]))
np.clip(rgb,0,1,rgb)
plt.imshow(rgb)
rgb_urls = [band4_url, band3_url, band2_url]
thumbnail_data = []
# url = rgb_urls[0]
for url in rgb_urls:
# From:
#
# https://automating-gis-processes.github.io/CSC/notebooks/L5/read-cogs.html
with rasterio.open(url) as raster:
# List of overviews from largest to smallest
oviews = raster.overviews(1)
# Retrieve the second-largest thumbnail
decimation_level = oviews[1]
h = int(raster.height/decimation_level)
w = int(raster.width/decimation_level)
thumbnail_channel = raster.read(1, out_shape=(1, h, w)) / norm_value
thumbnail_data.append(thumbnail_channel)
rgb = np.dstack((thumbnail_data[0],thumbnail_data[1],thumbnail_data[2]))
np.clip(rgb,0,1,rgb)
plt.imshow(rgb)