Spring over navigation

NAIP

AerialImagery AIforEarth USDA

Luftfoto fra NAIP (National Agricultural Imagery Program).

NAIP leverer luftfoto i høj opløsning fra hele USA. Dette program administreres af AFPO (Aerial Field Photography Office) inden for USDA (US Department of Agriculture). Datasættet bruges til landbrugsmæssig planlægning samt en række forskellige anvendelsesområder inden for klassificering af arealanvendelse.

Lagringsressourcer

Data lagres i blobs (en blob pr. billede) i datacenteret i det østlige USA i den følgende blobobjektbeholder:

https://naipblobs.blob.core.windows.net/naip

I denne objektbeholder er dataene organiseret i henhold til:

data/v1/[year]/states/[state]/[state]_[resolution]_[year]/[quadrangle]/filename

Eksempel:

data/v1/2011/states/al/al_1m_2011/30085/m_3008501_ne_16_1_20110815.mrf

Flere oplysninger om disse felter:

  • År: firecifret årstal. Data indsamles i hver stat hvert 3-5. år, hvor et givent år indeholder nogle (men ikke alle) stater. Der er for eksempel data for Alabama fra 2011 og 2013, men ikke 2012, mens der er data for Californien fra 2012, men ikke 2011 eller 2013. Esri leverer oplysninger om NAIP-dækning på deres interaktive NAIP-kort med årlig dækning.
  • Stat: Statskode bestående af to bogstaver.
  • Opløsning: Strengspecificering af opløsningen; “1 m” i alle de data, der i øjeblikket er tilgængelige i denne objektbeholder, med forbehold for ændringer.
  • Firkant: Id for USGS-firkant, der angiver et 7,5 minuts x 7,5 minuts område.

Filer lagres som MRF-billeder (Meta Raster Format) (formatspecifikation), hvor hvert billede repræsenteres af tre filer: en MRF-metadatafil i XML-format, en binær indeksfil (IDX) og en IRC-fil, der indeholder pixeldataene. Disse filer er produceret (fra det oprindelige GeoTIFF-format leveret af USDA) og organiseret af Esri. MRF-formatet er både cloudoptimeret og understøttes af GDAL.

Et komplet Python-eksempel på adgang til og afbildning af en NAIP-afbildning er tilgængelig i notesbogen under “dataadgang”.

Vi stiller også et skrivebeskyttet SAS-token (token til delt adgang) til rådighed for at give adgang til NAIP-data via f.eks. BlobFuse, som gør det muligt at indsætte blobobjektbeholdere som drev:

st=2019-07-18T03%3A53%3A22Z&se=2035-07-19T03%3A53%3A00Z&sp=rl&sv=2018-03-28&sr=c&sig=2RIXmLbLbiagYnUd49rgx2kOXKyILrJOgafmkODhRAQ%3D

Du kan få instruktioner i indsættelsen for Linux her.

NAIP-data kan forbruge hundredvis af terabyte, hvorfor behandling i stor skala udføres bedst i datacenteret i det østlige USA, hvor billederne opbevares. Hvis du bruger NAIP-data til miljømæssige videnskabelige formål, kan du overveje at ansøge om et AI for Earth-tilskud som støtte til dine beregningsbehov.

Indeks

Du kan få en liste med alle NAIP-filer som en ZIP-komprimeret TXT-fil her:

https://naipblobs.blob.core.windows.net/naip-index/naip-index.zip

Vi vedligeholder også en SQLite-database for at gøre det lettere at forespørge om billeder efter placering og tidspunkt. Se eksemplet på en notesbog for at få flere detaljer.

Du kan også gennemse dataene her.

Betagende billede


1m-opløsningsbillede af området nær Microsofts campus i Redmond i 2017.

Kontakt

Hvis du har spørgsmål vedrørende dette datasæt, kan du kontakte aiforearthdatasets@microsoft.com.

Access

Available inWhen to use
Azure Notebooks

Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine.

Select your preferred service:

Azure Notebooks

Azure Notebooks

Package: Language: Python

Demo notebook for accessing NAIP data on Azure

This notebook provides an example of accessing NAIP data from blob storage on Azure, displaying an image using the rasterio library.

We will demonstrate how to access and plot a tile given a known tile filename, as well as how to access tiles by lat/lon.

NAIP data are stored in the East US data center, so this notebook will run most efficiently on Azure compute located in East US. We recommend that substantial computation depending on NAIP data also be situated in East US. You don't want to download hundreds of terabytes to your laptop! If you are using NAIP data for environmental science applications, consider applying for an AI for Earth grant to support your compute requirements.

Imports and environment

In [1]:
# Standard packages
import tempfile
import warnings
import urllib
import shutil
import os

# Less standard, but still pip- or conda-installable
import matplotlib.pyplot as plt
import numpy as np
import rasterio
import rtree
import shapely
import pickle

# pip install progressbar2, not progressbar
import progressbar

from geopy.geocoders import Nominatim

latest_wkid = 3857
crs = "EPSG:4326"

# Storage locations are documented at http://aka.ms/ai4edata-naip
blob_root = 'https://naipblobs.blob.core.windows.net/naip'

index_files = ["tile_index.dat", "tile_index.idx", "tiles.p"]
index_blob_root = 'https://naipindex.blob.core.windows.net/allnaipindex/'
temp_dir = os.path.join(tempfile.gettempdir(),'naip')
os.makedirs(temp_dir,exist_ok=True)

# Spatial index that maps lat/lon to NAIP tiles; we'll load this when we first 
# need to access it.
index = None

warnings.filterwarnings("ignore")
%matplotlib inline

Functions

In [2]:
class DownloadProgressBar():
    """
    https://stackoverflow.com/questions/37748105/how-to-use-progressbar-module-with-urlretrieve
    """
    
    def __init__(self):
        self.pbar = None

    def __call__(self, block_num, block_size, total_size):
        if not self.pbar:
            self.pbar = progressbar.ProgressBar(max_value=total_size)
            self.pbar.start()
            
        downloaded = block_num * block_size
        if downloaded < total_size:
            self.pbar.update(downloaded)
        else:
            self.pbar.finish()
            

class NAIPTileIndex:
    """
    Utility class for performing NAIP tile lookups by location.
    """
    
    tile_rtree = None
    tile_index = None
    base_path = None
    
    def __init__(self, base_path=None):
        
        if base_path is None:
            
            base_path = temp_dir
            os.makedirs(base_path,exist_ok=True)
            
            for file_path in index_files:
                download_url(index_blob_root + file_path, base_path + '/' + file_path,
                             progress_updater=DownloadProgressBar())
                
        self.base_path = base_path
        self.tile_rtree = rtree.index.Index(base_path + "/tile_index")
        self.tile_index = pickle.load(open(base_path  + "/tiles.p", "rb"))
      
    
    def lookup_tile(self, lat, lon):
        """"
        Given a lat/lon coordinate pair, return the list of NAIP tiles that contain
        that location.

        Returns an array containing [mrf filename, idx filename, lrc filename].
        """

        point = shapely.geometry.Point(float(lon),float(lat))
        intersected_indices = list(self.tile_rtree.intersection(point.bounds))

        intersected_files = []
        tile_intersection = False

        for idx in intersected_indices:

            intersected_file = self.tile_index[idx][0]
            intersected_geom = self.tile_index[idx][1]
            if intersected_geom.contains(point):
                tile_intersection = True
                intersected_files.append(intersected_file)

        if not tile_intersection and len(intersected_indices) > 0:
            print(''''Error: there are overlaps with tile index, 
                      but no tile completely contains selection''')   
            return None
        elif len(intersected_files) <= 0:
            print("No tile intersections")
            return None
        else:
            return intersected_files
        
            
def download_url(url, destination_filename=None, progress_updater=None, force_download=False):
    """
    Download a URL to a temporary file
    """
    
    # This is not intended to guarantee uniqueness, we just know it happens to guarantee
    # uniqueness for this application.
    if destination_filename is None:
        url_as_filename = url.replace('://', '_').replace('/', '_')    
        destination_filename = \
            os.path.join(temp_dir,url_as_filename)
    if (not force_download) and (os.path.isfile(destination_filename)):
        print('Bypassing download of already-downloaded file {}'.format(os.path.basename(url)))
        return destination_filename
    print('Downloading file {} to {}'.format(os.path.basename(url),destination_filename),end='')
    urllib.request.urlretrieve(url, destination_filename, progress_updater)  
    assert(os.path.isfile(destination_filename))
    nBytes = os.path.getsize(destination_filename)
    print('...done, {} bytes.'.format(nBytes))
    return destination_filename
    

def download_naip_tile(mrf_url):
    """
    Given the url of a NAIP .mrf file on Azure, download the mrf file along with 
    the associated .idx and .lrc files (which together constitute a NAIP tile) to
    a local temporary directory.  Returns the paths of all downloaded files.
    
    NAIP images consist of an .mrf file (xml-formatted metadata), a binary index
    (.idx) file, and a .lrc file containing the actual pixel data.  The .mrf and
    .idx files are very small; a typical .lrc file may be in the hundreds of MB.
    """
    
    mrf_filename = os.path.join(temp_dir,mrf_url.replace('://', '_').replace('/', '_'))
    
    source_urls = [mrf_url]
    destination_filenames = [mrf_filename]

    source_urls.append(mrf_url.replace('.mrf','.idx'))
    destination_filenames.append(mrf_filename.replace('.mrf','.idx'))
    source_urls.append(mrf_url.replace('.mrf','.lrc'))
    destination_filenames.append(mrf_filename.replace('.mrf','.lrc'))

    for iFile in range(0,3):
        download_url(source_urls[iFile], destination_filenames[iFile], 
                     progress_updater=DownloadProgressBar())
        
    return destination_filenames


def display_naip_tile(filename):
    """
    Display a NAIP tile using rasterio.
    
    For .mrf-formatted tiles (which span multiple files), 'filename' should refer to the 
    .mrf file.
    """
    
    # NAIP tiles are enormous; downsize for plotting in this notebook
    dsfactor = 10
    
    with rasterio.open(filename) as raster:

        # NAIP imagery has four channels: R, G, B, IR
        #
        # Stack RGB channels into an image; we won't try to render the IR channel
        #
        # rasterio uses 1-based indexing for channels.
        h = int(raster.height/dsfactor)
        w = int(raster.width/dsfactor)
        print('Resampling to {},{}'.format(h,w))
        r = raster.read(1, out_shape=(1, h, w))
        g = raster.read(2, out_shape=(1, h, w))
        b = raster.read(3, out_shape=(1, h, w))        
    
    rgb = np.dstack((r,g,b))
    fig = plt.figure(figsize=(7.5, 7.5), dpi=100, edgecolor='k')
    plt.imshow(rgb)
    raster.close()
    
    
def get_coordinates_from_address(address):
    """
    Look up the lat/lon coordinates for an address.
    """
    
    geolocator = Nominatim(user_agent="NAIP")
    location = geolocator.geocode(address)
    print('Retrieving location for address:\n{}'.format(location.address))
    return location.latitude, location.longitude

Access and plot a NAIP tile by constructing a path

In [3]:
# Tiles are stored at:
#
# [blob root]/data/v1/[year]/states/[state]/[state]_[resolution]_[year]/[quadrangle]/[filename]

year = '2011'
state = 'al'
resolution = '1m'
quadrangle = '30085'
filename = 'm_3008501_ne_16_1_20110815.mrf'
mrf_url = blob_root + '/data/v1/' + year + '/states/' + state + '/' + state + '_' + resolution + \
  '_' + year + '/' + quadrangle + '/' + filename

# Download the image
image_filenames = download_naip_tile(mrf_url)

# Plot the image
print('Reading file:\n{}'.format(os.path.basename(image_filenames[0])))
assert os.path.isfile(image_filenames[0])
display_naip_tile(image_filenames[0])
Bypassing download of already-downloaded file m_3008501_ne_16_1_20110815.mrf
Bypassing download of already-downloaded file m_3008501_ne_16_1_20110815.idx
Bypassing download of already-downloaded file m_3008501_ne_16_1_20110815.lrc
Reading file:
https_naipblobs.blob.core.windows.net_naip_data_v1_2011_states_al_al_1m_2011_30085_m_3008501_ne_16_1_20110815.mrf
Resampling to 758,663