跳过导航

NAIP

AerialImagery AIforEarth USDA

美国国家农业影像计划 (NAIP) 中的航拍图像。

NAIP 提供全美的高分辨率航拍图像。 此计划由美国农业部 (USDA) 航空领域摄影办公室 (AFPO) 管理。 可提供 2010 年至今的数据。

存储资源

数据以经过云优化的 GeoTIFF 文件形式存储在美国东部数据中心 Azure Blob 存储中的以下 blob 容器中:

https://naipblobs.blob.core.windows.net/naip

在该容器内,数据构成方式如下:

v002/[state]/[year]/[state]_[resolution]_[year]/[quadrangle]/filename

…例如:

v002/al/2015/al_100cm_2015/30086/m_3008601_ne_16_1_20150804.tif

关于这些字段的更多详细信息:

  • 年份:四位数年份。 各州每 3-5 年收集一次图像,任何给定年份包含一些(但非所有)州。 例如,阿拉巴马州有 2011 年和 2013 年的数据,但没有 2012 年的数据,而加利福尼亚州有 2012 年的数据,但没有 2011 年或 2013 年的数据。 Esri 在其交互式 NAIP 年度覆盖范围地图中提供了关于 NAIP 覆盖范围的信息。
  • :双字母州代码。
  • 分辨率:字符串的图像分辨率规范,在NAIP 的不同历史阶段均有所变动’。 根据年份和状态,其值可能为 “050cm”、“060cm” 或 “100cm”。
  • 四边形USGS 四边形标识符,指定 7.5 分 x 7.5 分的区域。

文件存储为经过云优化的 GeoTIFF 图像,扩展名为 .tif。 这些文件由 Esri 生成(根据美国农业部所提供的初始格式)和整理。

每个图像还可使用小缩略图;可通过用 “.200.jpg” 替换 “.tif” 来检索缩略图。 例如:

https://naipblobs.blob.core.windows.net/naip/v002/al/2013/al_100cm_2013/30086/m_3008601_ne_16_1_20130928.200.jpg

可访问“数据访问”下提供的笔记本,找到访问 NAIP 图像并据其绘图的完整 Python 示例。

我们还提供只读 SAS(共享访问签名)令牌,以通过 BlobFuse 等访问 NAIP 数据,BlobFuse 可将 blob 容器装载为驱动器:

st=2019-07-18T03%3A53%3A22Z&se=2035-07-19T03%3A53%3A00Z&sp=rl&sv=2018-03-28&sr=c&sig=2RIXmLbLbiagYnUd49rgx2kOXKyILrJOgafmkODhRAQ%3D

此处提供适用于 Linux 的装载说明。

NAIP 数据可消耗数百兆兆字节,因此最好在图像存储地美国东部 Azure 数据中心执行大规模处理。 如果将 NAIP 数据用于环境科学应用,请考虑申请 AI for Earth 许可以满足计算需求。

索引

此处提供压缩的 .csv 文件形式的所有 NAIP 文件的列表:

https://naipblobs.blob.core.windows.net/naip-index/naip_v002_index.zip

我们还维护了一个 rtree 对象,可方便 Python 用户进行空间查询;有关详细信息,请参见示例笔记本

也可在此处浏览数据。

.mrf 文件到哪里去了?

2020 年 6 月,我们更新了整个 NAIP 存档,以提高覆盖率和可维护性。 我们还从 .mrf 格式转换为经过云优化的 GeoTIFF 格式,并对路径结构进行了一些更改。 .mrf 文件暂时仍可在另一容器中使用;如果它们对你的工作来说很重要,且你需要访问它们,请联系 aiforearthdatasets@microsoft.com

精致的图片


2017 年 Microsoft Redmond 园区附近地区的 1m 分辨率图像’。

联系人

若有关于此数据集的任何疑问,请联系 aiforearthdatasets@microsoft.com

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Select your preferred service:

Azure Notebooks

Azure Notebooks

Package: Language: Python

Demo notebook for accessing NAIP data on Azure

This notebook provides an example of accessing NAIP data from blob storage on Azure, displaying an image using the rasterio library.

We will demonstrate how to access and plot a tile given a known tile filename, as well as how to access tiles by lat/lon. Finally, we'll demonstrate how to retrieve only the patches you care about from our cloud-optimized image files.

NAIP data are stored in the East US data center, so this notebook will run most efficiently on Azure compute located in East US. We recommend that substantial computation depending on NAIP data also be situated in East US. You don't want to download hundreds of terabytes to your laptop! If you are using NAIP data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.

Imports and environment

In [1]:
# Standard packages
import tempfile
import warnings
import urllib
import shutil
import os

# Workaround for a problem in older rasterio versions
os.environ["CURL_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt" 

# Less standard, but still pip- or conda-installable
import matplotlib.pyplot as plt
import numpy as np
import rasterio
import rtree
import shapely
import pickle

# pip install progressbar2, not progressbar
import progressbar

from geopy.geocoders import Nominatim
from rasterio.windows import Window 
from tqdm import tqdm

latest_wkid = 3857
crs = "EPSG:4326"

# Storage locations are documented at http://aka.ms/ai4edata-naip
blob_root = 'https://naipblobs.blob.core.windows.net/naip'

index_files = ["tile_index.dat", "tile_index.idx", "tiles.p"]
index_blob_root = 'https://naipblobs.blob.core.windows.net/naip-index/rtree/'
temp_dir = os.path.join(tempfile.gettempdir(),'naip')
os.makedirs(temp_dir,exist_ok=True)

# Spatial index that maps lat/lon to NAIP tiles; we'll load this when we first 
# need to access it.
index = None

# URL where we've stashed a geojson file with the boundaries of Maryland.  Why do we
# need the boundaries of Maryland?  It's a surprise, you'll have to keep reading to find
# out.
maryland_boundary_url = 'https://ai4edatasetspublicassets.blob.core.windows.net/assets/maryland.json'

warnings.filterwarnings("ignore")
%matplotlib inline

Functions

In [2]:
class DownloadProgressBar():
    """
    https://stackoverflow.com/questions/37748105/how-to-use-progressbar-module-with-urlretrieve
    """
    
    def __init__(self):
        self.pbar = None

    def __call__(self, block_num, block_size, total_size):
        if not self.pbar:
            self.pbar = progressbar.ProgressBar(max_value=total_size)
            self.pbar.start()
            
        downloaded = block_num * block_size
        if downloaded < total_size:
            self.pbar.update(downloaded)
        else:
            self.pbar.finish()
            

class NAIPTileIndex:
    """
    Utility class for performing NAIP tile lookups by location.
    """
    
    tile_rtree = None
    tile_index = None
    base_path = None
    
    def __init__(self, base_path=None):
        
        if base_path is None:
            
            base_path = temp_dir
            os.makedirs(base_path,exist_ok=True)
            
            for file_path in index_files:
                download_url(index_blob_root + file_path, base_path + '/' + file_path,
                             progress_updater=DownloadProgressBar())
                
        self.base_path = base_path
        self.tile_rtree = rtree.index.Index(base_path + "/tile_index")
        self.tile_index = pickle.load(open(base_path  + "/tiles.p", "rb"))
      
    
    def lookup_tile(self, lat, lon):
        """"
        Given a lat/lon coordinate pair, return the list of NAIP tiles that contain
        that location.

        Returns an array containing [mrf filename, idx filename, lrc filename].
        """

        point = shapely.geometry.Point(float(lon),float(lat))
        intersected_indices = list(self.tile_rtree.intersection(point.bounds))

        intersected_files = []
        tile_intersection = False

        for idx in intersected_indices:

            intersected_file = self.tile_index[idx][0]
            intersected_geom = self.tile_index[idx][1]
            if intersected_geom.contains(point):
                tile_intersection = True
                intersected_files.append(intersected_file)

        if not tile_intersection and len(intersected_indices) > 0:
            print('''Error: there are overlaps with tile index, 
                      but no tile completely contains selection''')   
            return None
        elif len(intersected_files) <= 0:
            print("No tile intersections")
            return None
        else:
            return intersected_files
        
            
def download_url(url, destination_filename=None, progress_updater=None, force_download=False):
    """
    Download a URL to a temporary file
    """
    
    # This is not intended to guarantee uniqueness, we just know it happens to guarantee
    # uniqueness for this application.
    if destination_filename is None:
        url_as_filename = url.replace('://', '_').replace('/', '_')    
        destination_filename = \
            os.path.join(temp_dir,url_as_filename)
    if (not force_download) and (os.path.isfile(destination_filename)):
        print('Bypassing download of already-downloaded file {}'.format(os.path.basename(url)))
        return destination_filename
    print('Downloading file {} to {}'.format(os.path.basename(url),destination_filename),end='')
    urllib.request.urlretrieve(url, destination_filename, progress_updater)  
    assert(os.path.isfile(destination_filename))
    nBytes = os.path.getsize(destination_filename)
    print('...done, {} bytes.'.format(nBytes))
    return destination_filename
    

def display_naip_tile(filename):
    """
    Display a NAIP tile using rasterio.
    
    For .mrf-formatted tiles (which span multiple files), 'filename' should refer to the 
    .mrf file.
    """
    
    # NAIP tiles are enormous; downsize for plotting in this notebook
    dsfactor = 10
    
    with rasterio.open(filename) as raster:

        # NAIP imagery has four channels: R, G, B, IR
        #
        # Stack RGB channels into an image; we won't try to render the IR channel
        #
        # rasterio uses 1-based indexing for channels.
        h = int(raster.height/dsfactor)
        w = int(raster.width/dsfactor)
        print('Resampling to {},{}'.format(h,w))
        r = raster.read(1, out_shape=(1, h, w))
        g = raster.read(2, out_shape=(1, h, w))
        b = raster.read(3, out_shape=(1, h, w))        
    
    rgb = np.dstack((r,g,b))
    fig = plt.figure(figsize=(7.5, 7.5), dpi=100, edgecolor='k')
    plt.imshow(rgb)
    raster.close()
    
    
def get_coordinates_from_address(address):
    """
    Look up the lat/lon coordinates for an address.
    """
    
    geolocator = Nominatim(user_agent="NAIP")
    location = geolocator.geocode(address)
    print('Retrieving location for address:\n{}'.format(location.address))
    return location.latitude, location.longitude

Access and plot a NAIP tile by constructing a path

In [3]:
# Tiles are stored at:
#
# [blob root]/v002/[state]/[year]/[state]_[resolution]_[year]/[quadrangle]/filename

year = '2015'
state = 'al'
resolution = '100cm'
quadrangle = '30086'
filename = 'm_3008601_ne_16_1_20150804.tif'
tile_url = blob_root + '/v002/' + state + '/' + year + '/' + state + '_' + resolution + \
    '_' + year + '/' + quadrangle + '/' + filename

print(tile_url)

# Download the image
image_filename = download_url(tile_url,progress_updater=DownloadProgressBar())

# Plot the image
print('Reading file:\n{}'.format(os.path.basename(image_filename)))
assert os.path.isfile(image_filename)
display_naip_tile(image_filename)
https://naipblobs.blob.core.windows.net/naip/v002/al/2015/al_100cm_2015/30086/m_3008601_ne_16_1_20150804.tif
Bypassing download of already-downloaded file m_3008601_ne_16_1_20150804.tif
Reading file:
https_naipblobs.blob.core.windows.net_naip_v002_al_2015_al_100cm_2015_30086_m_3008601_ne_16_1_20150804.tif
Resampling to 753,657