Video data from the Ocean Observatories Initiative seafloor camera deployed at Axial Volcano on the Juan de Fuca Ridge.
Overview
The OOICloud Project is making data from the Ocean Observatories Initiative (OOI) publicly available on Azure Open Datasets and accessible through a Pangeo interface. A primary goal of the project is to provide these data to the scientific community using a cloud-performant object storage model, and to provide large-scale data-proximate compute capabilities for research investigations. The OOI sensor network consists of 89 scientific platforms with approximately 830 instruments, and provides nearly 5 TB of data each month for the study of the ocean-atmosphere system from the continental margins to the mid-ocean ridges. A core component of OOI is the Regional Cabled Array, which uses a fiber-optic cable to connect and power the largest array of networked oceanographic instruments in the world, delivering data in real-time to shore.
CamHD is a high-definition video camera connected to the OOI’s fiber optic cable at Axial Seamount and provides data that can support a wide range of oceanographic, biological, and geophysical investigations. Every three hours, the camera scans a hydrothermal vent chimney, imaging the entire chimney over the course of about fifteen minutes. The notebook provided under “data access” demonstrates how to load video data from CamHD and demonstrates the basic usage of the pycamhd library, which can be used to extract frames from the ProRes-encoded Quicktime files.
Storage resources
All available video files are listed in a JSON file that has useful information such as the Unix timestamp (seconds) of the first frame in each video, and the total number of frames in each video. Data are stored as bock blobs on Azure Blob storage in the following container:
https://ooiopendata.blob.core.windows.net/camhd
We also provide a read-only SAS (shared access signature) token to allow access to NAIP data via, e.g., BlobFuse, which allows you to mount blob containers as drives:
?sv=2019-12-12&si=camhd-aod-ro&sr=c&sig=zFVfMOqa1YW9mxbEusUsKfPrKjkBFyD2YAUJficSuCo%3D
Mounting instructions for Linux are here. Large-scale processing using this dataset is best performed in the East US Azure data center, where the data are stored. Computational resources are available at ooi.pangeo.io, and if you are using CamHD data for environmental science applications, you may also consider applying for an AI for Earth grant to support your compute requirements.
Pretty picture
The HD camera (orange triangular frame) images the 14 ft-tall actively venting hot spring deposit “Mushroom” located within the caldera for Axial Seamount. The vent rests on an old lava flow. Radiating cracks in the flow are filled with white bacterial mats and small tube worms, marking sites of diffusely flowing fluids that issue from the fractures in the basalt. The 3-D temperature array in the background encloses a tube worm bush, sending 24 temperature measurements live to shore every second.
Contact
For questions about this dataset, contact aiforearthdatasets@microsoft.com
.
Notices
MICROSOFT PROVIDES AZURE OPEN DATASETS ON AN “AS IS” BASIS. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, GUARANTEES OR CONDITIONS WITH RESPECT TO YOUR USE OF THE DATASETS. TO THE EXTENT PERMITTED UNDER YOUR LOCAL LAW, MICROSOFT DISCLAIMS ALL LIABILITY FOR ANY DAMAGES OR LOSSES, INCLUDING DIRECT, CONSEQUENTIAL, SPECIAL, INDIRECT, INCIDENTAL OR PUNITIVE, RESULTING FROM YOUR USE OF THE DATASETS.
This dataset is provided under the original terms that Microsoft received source data. The dataset may include data sourced from Microsoft.
Access
Available in | When to use |
---|---|
Azure Notebooks | Quickly explore the dataset with Jupyter notebooks hosted on Azure or your local machine. |
Select your preferred service:
Azure Notebooks
# Standard packages
import numpy as np
import pandas as pd
import fsspec
import time
import datetime
import random
import matplotlib.pyplot as plt
from ipywidgets import interact
from ipywidgets import IntSlider
# Non-standard, but still pip- or conda-installable
import pycamhd as camhd
# .json file containing video metadata
dbcamhd_url = 'https://ooiopendata.blob.core.windows.net/camhd/dbcamhd.json'
with fsspec.open(dbcamhd_url) as f:
dbcamhd = pd.read_json(f, orient="records", lines=True)
dbcamhd.tail()
# Find files from September 20, 2017...
start_time = datetime.datetime(2017,9,20,0,0,0)
end_time = datetime.datetime(2017,9,21,0,0,0)
start_unixtime = time.mktime(start_time.timetuple())
end_unixtime = time.mktime(end_time.timetuple())
matching_rows = dbcamhd[dbcamhd['timestamp'].between(start_unixtime,end_unixtime)]
matching_rows
# ...and choose the first file from that day.
mov = matching_rows.iloc[0]
mov
def show_image(frame_number):
plt.rc('figure', figsize=(12, 6))
plt.rcParams.update({'font.size': 8})
frame = camhd.get_frame(mov.url, frame_number)
fig, ax = plt.subplots();
im1 = ax.imshow(frame);
plt.yticks(np.arange(0,1081,270))
plt.xticks(np.arange(0,1921,480))
plt.title('Deployment: %s File: %s Frame: %s' % (mov.deployment, mov['name'], frame_number));
# Choose a random frame
initial_frame = random.randrange(0,mov.frame_count)
print('Showing frame {}'.format(initial_frame))
show_image(initial_frame)
frame_slider = IntSlider(min=0, max=mov.frame_count-1, step=1, value=initial_frame, continuous_update=False)
interact(show_image, frame_number=frame_slider)