Пропустить навигацию

NAIP

AerialImagery AIforEarth USDA

This dataset contains aerial imagery from the National Agricultural Imagery Program (NAIP).

NAIP provides US-wide, high-resolution aerial imagery. This program is administered by the Aerial Field Photography Office (AFPO) within the US Department of Agriculture (USDA). This dataset is used for agricultural planning, as well as for a variety of applications in land use classification.

Storage resources

Data are stored in blobs (one blob per image) in the East US data center, in the following blob container:

https://naipblobs.blob.core.windows.net/naip

Within that container, data are organized according to:

data/v1/[year]/states/[state]/[state]_[resolution]_[year]/[quadrangle]/filename

…for example:

data/v1/2011/states/al/al_1m_2011/30085/m_3008501_ne_16_1_20110815.mrf

More details on these fields:

  • Year: Four-digit year. Data is collected in each state every 3-5 years, with any given year containing some (but not all) states. For example, Alabama has data in 2011 and 2013, but not in 2012, while California has data in 2012, but not 2011 or 2013. Esri provides information about NAIP coverage in their interactive NAIP annual coverage map.
  • State: Two-letter state code.
  • Resolution: String specification of the resolution; “1m” in all the data currently available in this container, but subject to change.
  • Quadrangle: USGS quadrangle identifier, specifying a 7.5 minute x 7.5 minute area.

Files are stored as .mrf (Meta Raster Format) images (format spec), where each image is represented by three files: an .mrf metadata file in .xml format, a binary index (.idx) file, and a .lrc file containing the pixel data. These files were produced (from the original, USDA-provided GeoTIFF format) and organized by Esri. The .mrf format is both cloud-optimized and supported by GDAL.

We also provide a read-only SAS (shared access signature) token to allow access to NAIP data via, e.g., BlobFuse, which allows you to mount blob containers as drives:

st=2019-07-18T03%3A53%3A22Z&se=2035-07-19T03%3A53%3A00Z&sp=rl&sv=2018-03-28&sr=c&sig=2RIXmLbLbiagYnUd49rgx2kOXKyILrJOgafmkODhRAQ%3D

Mounting instructions for Linux are here.

NAIP data can consume hundreds of terabytes, so large-scale processing is best performed in the East US Azure data center, where the images are stored. If you are using NAIP data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.

Index

A list of all NAIP files is available here, as a zipped .txt file:

https://naipblobs.blob.core.windows.net/naip-index/naip-index.zip

We also maintain a SQLite database to facilitate querying images by location and time; see the sample notebook for details.

Data can also be browsed here.

Contact

For questions about this dataset, contact aiforearthdatasets@microsoft.com.

Доступ

Доступно вСценарии использования
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Выберите предпочитаемую службу:

Azure Notebooks

Azure Notebooks

Пакет: Язык: Python

Demo notebook for accessing NAIP data on Azure

This notebook provides an example of accessing NAIP data from blob storage on Azure, and displaying a NAIP image using rasterio.

We will demonstrate how to access and plot a tile given a known tile filename, as well as how to access tiles by lat/lon.

NAIP data are stored in the East US data center, so this notebook will run most efficiently on Azure compute located in East US. We recommend that substantial computation depending on NAIP data also be situated in East US. You don't want to download hundreds of terabytes to your laptop! If you are using NAIP data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.

Imports and environment

In [3]:
# Standard packages
import tempfile
import warnings
import urllib
import os
import shutil

# All of the following are pip- or conda-installable
import matplotlib.pyplot as plt
import numpy as np
import rasterio
import fiona
import rtree
import shapely
import pickle
import reverse_geocoder as rg 

from rasterio.plot import show
from fiona import transform
from shapely import geometry
from rasterio.plot import show
from geopy.geocoders import Nominatim

latest_wkid = 3857
crs = "EPSG:4326"

# Storage locations are documented at http://aka.ms/ai4edata-naip
blob_root = 'https://naipblobs.blob.core.windows.net/naip'

index_files = ["tile_index.dat", "tile_index.idx", "tiles.p"]
index_blob_root = 'https://naipindex.blob.core.windows.net/allnaipindex/'
naip_temp_path = os.path.join(tempfile.gettempdir(),'naip')
os.makedirs(naip_temp_path,exist_ok=True)

index = None

warnings.filterwarnings("ignore")
%matplotlib inline

Functions

In [4]:
class NAIPTileIndex:
    """
    Utility class for performing NAIP tile lookups by location
    """    
    
    tile_rtree = None
    tile_index = None
    base_path = None
    
    def __init__(self, base_path=None):
        
        if base_path is None:
            
            base_path = naip_temp_path
            os.makedirs(base_path,exist_ok=True)
            
            for file_path in index_files:
                download_url(index_blob_root + file_path,base_path + '/' + file_path)
        
        self.base_path = base_path
        self.tile_rtree = rtree.index.Index(base_path + "/tile_index")
        self.tile_index = pickle.load(open(base_path  + "/tiles.p", "rb"))
      
    
    def lookup_tile(self, lat, lon):
        """"
        Given a lat/lon coordinate pair, return the list of NAIP tiles that contain
        that location

        Returns an array containing [mrf filename, idx filename, lrc filename]
        """
        
        point = shapely.geometry.Point(float(lon),float(lat))
        intersected_indices = list(self.tile_rtree.intersection(point.bounds))

        intersected_files = []
        tile_intersection = False

        for idx in intersected_indices:

            intersected_file = self.tile_index[idx][0]
            intersected_geom = self.tile_index[idx][1]
            if intersected_geom.contains(point):
                tile_intersection = True
                intersected_files.append(intersected_file)

        if not tile_intersection and len(intersected_indices) > 0:
            print(''''Error: there are overlaps with tile index, 
                      but no tile completely contains selection''')   
            return None
        elif len(intersected_files) <= 0:
            print("No tile intersections")
            return None
        else:
            return intersected_files
        
            
def download_url(url, destination_filename):
    """
    Utility function for downloading a URL to a local file
    """
    
    print('Downloading file {}'.format(os.path.basename(url)),end='')
    urllib.request.urlretrieve(url, destination_filename)  
    assert(os.path.isfile(destination_filename))
    nBytes = os.path.getsize(destination_filename)
    print('...done, {} bytes.'.format(nBytes))
    

def download_naip_tile(mrf_url):
    """
    Given the url of a NAIP .mrf file on Azure, download the mrf file along with 
    the associated .idx and .lrc files (which together constitute a NAIP tile) to
    a local temporary directory.  Returns the paths of all downloaded files.
    
    NAIP images consist of an mrf file (xml-formatted metadata), a binary index
    (.idx) file, and a .lrc file containing the actual pixel data.  The .mrf and
    .idx files are very small; a typical .lrc file may be in the hundreds of MB.
    """
    
    mrf_filename = os.path.join(naip_temp_path,next(tempfile._get_candidate_names())) + '.mrf'

    source_urls = [mrf_url]
    destination_filenames = [mrf_filename]

    source_urls.append(mrf_url.replace('.mrf','.idx'))
    destination_filenames.append(mrf_filename.replace('.mrf','.idx'))
    source_urls.append(mrf_url.replace('.mrf','.lrc'))
    destination_filenames.append(mrf_filename.replace('.mrf','.lrc'))

    for iFile in range(0,3):
        download_url(source_urls[iFile],destination_filenames[iFile])
        
    return destination_filenames


def display_naip_tile(mrf_filename):
    """
    Given a URL or filename pointing to a NAIP .mrf file, display that tile via 
    rasterio
    """
    
    assert(os.path.isfile(mrf_filename))
    
    # NAIP tiles are enormous; downsize for plotting in this notebook
    dsfactor = 10
    
    with rasterio.open(mrf_filename) as raster:

        # NAIP imagery has four channels: R, G, B, IR
        #
        # Stack RGB channels into an image; we won't try to render the IR channel
        #
        # rasterio uses 1-based indexing for channels.
        h = int(raster.height/dsfactor)
        w = int(raster.width/dsfactor)
        print('Resampling to {},{}'.format(h,w))
        r = raster.read(1, out_shape=(1, h, w))
        g = raster.read(2, out_shape=(1, h, w))
        b = raster.read(3, out_shape=(1, h, w))        
    
    rgb = np.dstack((r,g,b))
    fig = plt.figure(figsize=(7.5, 7.5), dpi=100, edgecolor='k')
    plt.imshow(rgb)
    raster.close()
    
    
def get_coordinates_from_address(address):
    """
    Look up the latitude and longitude corresponding to an address
    """
    
    geolocator = Nominatim(user_agent="NAIP")
    location = geolocator.geocode(address)
    print('Retrieving location for address:\n{}'.format(location.address))
    return location.latitude, location.longitude

Access and plot a NAIP tile by constructing a path

In [5]:
# Tiles are stored at:
#
# [blob root]/data/v1/[year]/states/[state]/[state]_[resolution]_[year]/[quadrangle]/[filename]

year = '2011'
state = 'al'
resolution = '1m'
quadrangle = '30085'
filename = 'm_3008501_ne_16_1_20110815.mrf'
mrf_url = blob_root + '/data/v1/' + year + '/states/' + state + '/' + state + '_' + resolution + \
  '_' + year + '/' + quadrangle + '/' + filename

# Download the image
image_filenames = download_naip_tile(mrf_url)

# Plot the image
display_naip_tile(image_filenames[0])
Downloading file m_3008501_ne_16_1_20110815.mrf...done, 1214 bytes.
Downloading file m_3008501_ne_16_1_20110815.idx...done, 17408 bytes.
Downloading file m_3008501_ne_16_1_20110815.lrc...done, 230994615 bytes.
Resampling to 758,663