Navigatie overslaan

Cooling down storage costs in the healthcare AI blueprint

Gepost op 2 oktober, 2018

Principal Systems Architect, Microsoft Azure

Artificial Intelligence (AI) and Machine Learning (ML) are transforming healthcare. From streamlining operations to aiding in clinical diagnosis. Healthcare organizations are often challenged to begin an AI/ML journey due to lack of experience or high cost.

The Azure Healthcare AI blueprint installs a HIPAA and HITRUST compliant environment in Azure for managing and running healthcare AI experiments. This provides a quick start to your AI/ML efforts and can get technical staff proficient with a reference implementation very quickly and with little cost.

Since it is a reference implementation, you must consider the ongoing costs to maintain the blueprint infrastructure in production. One place to look for easy savings is in storage. In this entry, we’ll discuss features of Azure Blob Storage, and practices to lower the cost of blob storage.


The case for more storage: AI and cognitive services

The blueprint is designed to ease the learning and implementation of AI/ML in a healthcare organization. A “patient length of stay” experiment is included which uses .csv files that take up little room. But consider other data that could be used for machine learning. These include radiology (x-ray) and MRI data along with other radiological images. And as AI services become part of the mainstream, you could also be storing video and audio files — because cognitive services can transcribe audio or tag photographs. In sum, as the capabilities of AI grows, the need for blob storage can expand dramatically.

Required storage

Often, a workload in Azure starts with saving a file into blob storage, because it is important to keep a copy of the data as the “source of truth data” for long periods of time, up to several years to comply with retention regulations or policies. The storage of the original data means any operations done with the data should be repeatable.

When data is placed into blob storage it is considered “hot” stored data, which is available anytime you need it. Data is retrieved immediately upon request.

Once data in blob storage has been used — say for an ML Studio experiment — it may not be necessary to hold it in hot storage. In cases where the data needs to be kept but is not accessed very often, there are ways to lower the costs.

Using tiered Blob Storage

Blob Storage has three data tiers, each based on how often the data is retrieved, and each with a different cost.

  • If data is accessed frequently, it is considered to be in hot storage. This is the most accessible, yet expensive, Blob Storage option.
  • If the data is accessed less frequently and has a lifespan of fewer than 30 days, it is a prime candidate to move into cool storage, which allows quick and easy access to the data, like hot storage, but is not expected to be accessed often.
  • Archive storage tier is optimized for scenarios in which the data is held for more than 180 days and is not expected to be immediately available upon query. It can take several hours to retrieve data from archive storage.

Data may also be purged on a given time interval. For example, it may be necessary to keep healthcare diagnostic data for a mandated period of time. When that time runs out, however, the data should be removed from storage.

Managing data between tiers

Different data types in healthcare need to be stored for different lengths of time and may have different lifecycles as it moved from hot to cool or archival storage. Azure Blob Storage lifecycle management offers a rule-based policy which can be used to transition data to the most appropriate access tier and to expire data at the end of its lifecycle.

Through lifecycle management, rules may be created to perform several actions on blobs individually.

  • Transition blobs to a cooler storage tier
  • Optimize for performance and cost
  • Delete blobs at the end of their lifecycles
  • Define rules to be executed at set intervals at the storage account level

These capabilities can save the healthcare organization money when using blob storage, as the blueprint does. When building a larger solution using the blueprint, costs of data storage may go up, requiring attention be paid to the access tier being used.

Recommended next steps

In addition to the Azure portal, there are several other options for programmatically accessing and managing Blob Storage including PowerShell, and CLI tools and .NET, Java, Python, and Node.js client libraries all support Blob-level tiering and archive storage.

To understand the data ingestion and storage model of the Healthcare AI blueprint, install it and consider the pricing models of Blob Storage based on the idea that patient length of stay is affected by admissions and discharges for any given day.

Find even more system optimizations and have a successful install of the Healthcare AI blueprint by reading the Implementing the Azure blueprint for AI article. This article takes you through the blueprint and highlights areas where special attention should be paid, similar to which Blob Storage tier you might use!


Your comments and recommendations are welcome below. I regularly post on technology in healthcare topics. Reach out and connect with me on LinkedIn or Twitter.