Skip navigation

NAIP

AerialImagery AIforEarth USDA

Aerial imagery from the National Agricultural Imagery Program (NAIP).

NAIP provides US-wide, high-resolution aerial imagery. This program is administered by the Aerial Field Photography Office (AFPO) within the US Department of Agriculture (USDA). Data are available from 2010 to the present.

Storage resources

Data are stored in cloud-optimized GeoTIFF files in Azure Blob Storage in the East US Azure region, in the following blob container:

https://naipblobs.blob.core.windows.net/naip

Within that container, data are organized according to:

v002/[state]/[year]/[state]_[resolution]_[year]/[quadrangle]/[filename].tif

…for example:

v002/al/2015/al_100cm_2015/30086/m_3008601_ne_16_1_20150804.tif

More details on these fields:

  • Year: Four-digit year. Images are collected in each state every 3-5 years, with any given year containing some (but not all) states. For example, Alabama has data in 2011 and 2013, but not in 2012, while California has data in 2012, but not 2011 or 2013. Esri provides information about NAIP coverage in their interactive NAIP annual coverage map.
  • State: Two-letter state code.
  • Resolution: String specification of image resolution, which has varied throughout NAIP’s history. Depending on year and state, this may be “050cm”, “060cm”, or “100cm”.
  • Quadrangle: USGS quadrangle identifier, specifying a 7.5 minute x 7.5 minute area.

The filename component of the path (m_3008601_ne_16_1_20150804 in this example) is preserved from USDA’s original archive to allow consistent referencing across different copies of NAIP. Minor variation in file naming exists, but filenames are generally formatted as:

m_[quadrangle]_[quarter-quad]_[utm zone]_[resolution]_[capture date].tif

…for example, the above file is in USGS quadrangle 30086, in the NE quarter-quad, which is in UTM zone 16, with 1m resolution, and was captured on 8/4/2014. In some cases, an additional date may be appended to the filename; in these cases, the first date represents the capture date, and the second date represents the date at which a subsequent version of the image was released to allow for a correction. For example:

v002/nc/2018/nc_060cm_2018/36077/m_3607744_se_18_060_20180903_20190210.tif

…was captured on 9/3/2018, and re-released on 2/10/2019. If you’re reading this because you want to digest this filename, the first date is almost definitely what you’re interested in.

Files are stored as cloud-optimized GeoTIFF images, with a .tif extension. These files were produced (from the original, USDA-provided format) and organized by Esri.

Small thumbnails are also available for each image; substitute “.tif” with “.200.jpg” to retrieve the thumbnail. For example, a thumbnail rendering of the image used in the naming convention example above is available at:

https://naipblobs.blob.core.windows.net/naip/v002/al/2015/al_100cm_2015/30086/m_3008601_ne_16_1_20150804.200.jpg

A complete Python example of accessing and plotting a NAIP image is available in the notebook provided under “data access”.

We also provide a read-only SAS (shared access signature) token to allow access to NAIP data via, e.g., BlobFuse, which allows you to mount blob containers as drives:

sv=2019-10-10&si=naip-ro&sr=c&sig=W2mWBv2Rb8%2BQE7N2KNoFstIsoQru5PnZ2m%2B4HhTlHEU%3D

Mounting instructions for Linux are here.

NAIP data can consume hundreds of terabytes, so large-scale processing is best performed in the East US Azure data center, where the images are stored. If you are using NAIP data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.

Index

A list of all NAIP files is available here, as a zipped .csv file:

https://naipblobs.blob.core.windows.net/naip-index/naip_v002_index.zip

We also maintain an rtree object to facilitate spatial queries for Python users; see the sample notebook for details.

Data can also be browsed here.

Where did the .mrf files go?

In June of 2020, we updated our entire NAIP archive to improve both coverage and maintainability. We also switched from .mrf format to cloud-optimized GeoTIFF, and made some changes to path structures. The .mrf files are temporarily still available in another container; if they are important to your work and you need access, contact aiforearthdatasets@microsoft.com.

Pretty picture


1m-resolution imagery of the area near Microsoft’s Redmond Campus in 2017.

Contact

For questions about this dataset, contact aiforearthdatasets@microsoft.com.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Select your preferred service:

Azure Notebooks

Azure Notebooks

Package: Language: Python

Demo notebook for accessing NAIP data on Azure

This notebook provides an example of accessing NAIP data from blob storage on Azure, displaying an image using the rasterio library.

We will demonstrate how to access and plot a tile given a known tile filename, as well as how to access tiles by lat/lon. Finally, we'll demonstrate how to retrieve only the patches you care about from our cloud-optimized image files.

NAIP data are stored in the East US data center, so this notebook will run most efficiently on Azure compute located in East US. We recommend that substantial computation depending on NAIP data also be situated in East US. You don't want to download hundreds of terabytes to your laptop! If you are using NAIP data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.

Imports and environment

In [1]:
# Standard packages
import tempfile
import warnings
import urllib
import shutil
import os

# Workaround for a problem in older rasterio versions
os.environ["CURL_CA_BUNDLE"] = "/etc/ssl/certs/ca-certificates.crt" 

# Less standard, but still pip- or conda-installable
import matplotlib.pyplot as plt
import numpy as np
import rasterio
import rtree
import shapely
import pickle

# pip install progressbar2, not progressbar
import progressbar

from geopy.geocoders import Nominatim
from rasterio.windows import Window 
from tqdm import tqdm

latest_wkid = 3857
crs = "EPSG:4326"

# Storage locations are documented at http://aka.ms/ai4edata-naip
blob_root = 'https://naipblobs.blob.core.windows.net/naip'

index_files = ["tile_index.dat", "tile_index.idx", "tiles.p"]
index_blob_root = 'https://naipblobs.blob.core.windows.net/naip-index/rtree/'
temp_dir = os.path.join(tempfile.gettempdir(),'naip')
os.makedirs(temp_dir,exist_ok=True)

# Spatial index that maps lat/lon to NAIP tiles; we'll load this when we first 
# need to access it.
index = None

# URL where we've stashed a geojson file with the boundaries of Maryland.  Why do we
# need the boundaries of Maryland?  It's a surprise, you'll have to keep reading to find
# out.
maryland_boundary_url = 'https://ai4edatasetspublicassets.blob.core.windows.net/assets/maryland.json'

warnings.filterwarnings("ignore")
%matplotlib inline

Functions

In [2]:
class DownloadProgressBar():
    """
    https://stackoverflow.com/questions/37748105/how-to-use-progressbar-module-with-urlretrieve
    """
    
    def __init__(self):
        self.pbar = None

    def __call__(self, block_num, block_size, total_size):
        if not self.pbar:
            self.pbar = progressbar.ProgressBar(max_value=total_size)
            self.pbar.start()
            
        downloaded = block_num * block_size
        if downloaded < total_size:
            self.pbar.update(downloaded)
        else:
            self.pbar.finish()
            

class NAIPTileIndex:
    """
    Utility class for performing NAIP tile lookups by location.
    """
    
    tile_rtree = None
    tile_index = None
    base_path = None
    
    def __init__(self, base_path=None):
        
        if base_path is None:
            
            base_path = temp_dir
            os.makedirs(base_path,exist_ok=True)
            
            for file_path in index_files:
                download_url(index_blob_root + file_path, base_path + '/' + file_path,
                             progress_updater=DownloadProgressBar())
                
        self.base_path = base_path
        self.tile_rtree = rtree.index.Index(base_path + "/tile_index")
        self.tile_index = pickle.load(open(base_path  + "/tiles.p", "rb"))
      
    
    def lookup_tile(self, lat, lon):
        """"
        Given a lat/lon coordinate pair, return the list of NAIP tiles that contain
        that location.

        Returns an array containing [mrf filename, idx filename, lrc filename].
        """

        point = shapely.geometry.Point(float(lon),float(lat))
        intersected_indices = list(self.tile_rtree.intersection(point.bounds))

        intersected_files = []
        tile_intersection = False

        for idx in intersected_indices:

            intersected_file = self.tile_index[idx][0]
            intersected_geom = self.tile_index[idx][1]
            if intersected_geom.contains(point):
                tile_intersection = True
                intersected_files.append(intersected_file)

        if not tile_intersection and len(intersected_indices) > 0:
            print('''Error: there are overlaps with tile index, 
                      but no tile completely contains selection''')   
            return None
        elif len(intersected_files) <= 0:
            print("No tile intersections")
            return None
        else:
            return intersected_files
        
            
def download_url(url, destination_filename=None, progress_updater=None, force_download=False):
    """
    Download a URL to a temporary file
    """
    
    # This is not intended to guarantee uniqueness, we just know it happens to guarantee
    # uniqueness for this application.
    if destination_filename is None:
        url_as_filename = url.replace('://', '_').replace('/', '_')    
        destination_filename = \
            os.path.join(temp_dir,url_as_filename)
    if (not force_download) and (os.path.isfile(destination_filename)):
        print('Bypassing download of already-downloaded file {}'.format(os.path.basename(url)))
        return destination_filename
    print('Downloading file {} to {}'.format(os.path.basename(url),destination_filename),end='')
    urllib.request.urlretrieve(url, destination_filename, progress_updater)  
    assert(os.path.isfile(destination_filename))
    nBytes = os.path.getsize(destination_filename)
    print('...done, {} bytes.'.format(nBytes))
    return destination_filename
    

def display_naip_tile(filename):
    """
    Display a NAIP tile using rasterio.
    
    For .mrf-formatted tiles (which span multiple files), 'filename' should refer to the 
    .mrf file.
    """
    
    # NAIP tiles are enormous; downsize for plotting in this notebook
    dsfactor = 10
    
    with rasterio.open(filename) as raster:

        # NAIP imagery has four channels: R, G, B, IR
        #
        # Stack RGB channels into an image; we won't try to render the IR channel
        #
        # rasterio uses 1-based indexing for channels.
        h = int(raster.height/dsfactor)
        w = int(raster.width/dsfactor)
        print('Resampling to {},{}'.format(h,w))
        r = raster.read(1, out_shape=(1, h, w))
        g = raster.read(2, out_shape=(1, h, w))
        b = raster.read(3, out_shape=(1, h, w))        
    
    rgb = np.dstack((r,g,b))
    fig = plt.figure(figsize=(7.5, 7.5), dpi=100, edgecolor='k')
    plt.imshow(rgb)
    raster.close()
    
    
def get_coordinates_from_address(address):
    """
    Look up the lat/lon coordinates for an address.
    """
    
    geolocator = Nominatim(user_agent="NAIP")
    location = geolocator.geocode(address)
    print('Retrieving location for address:\n{}'.format(location.address))
    return location.latitude, location.longitude

Access and plot a NAIP tile by constructing a path

In [3]:
# Tiles are stored at:
#
# [blob root]/v002/[state]/[year]/[state]_[resolution]_[year]/[quadrangle]/filename

year = '2015'
state = 'al'
resolution = '100cm'
quadrangle = '30086'
filename = 'm_3008601_ne_16_1_20150804.tif'
tile_url = blob_root + '/v002/' + state + '/' + year + '/' + state + '_' + resolution + \
    '_' + year + '/' + quadrangle + '/' + filename

print(tile_url)

# Download the image
image_filename = download_url(tile_url,progress_updater=DownloadProgressBar())

# Plot the image
print('Reading file:\n{}'.format(os.path.basename(image_filename)))
assert os.path.isfile(image_filename)
display_naip_tile(image_filename)
https://naipblobs.blob.core.windows.net/naip/v002/al/2015/al_100cm_2015/30086/m_3008601_ne_16_1_20150804.tif
Bypassing download of already-downloaded file m_3008601_ne_16_1_20150804.tif
Reading file:
https_naipblobs.blob.core.windows.net_naip_v002_al_2015_al_100cm_2015_30086_m_3008601_ne_16_1_20150804.tif
Resampling to 753,657