• 3 min read

Azure Event Hubs Archive is now in public preview, providing efficient micro-batch processing

Of the many key scenarios for Event Hubs, are long term data archival and downstream micro-batch processing. Customers typically use compute (Event Processor Host/Event Receivers) or Stream Analytics jobs to perform these archival or batch processing tasks. These along with other custom downstream solutions involve significant overhead with regards to scheduling and managing batch jobs. Why not have something out-of-the-box that solves this problem? Well, look no further – there’s now a lovely, nifty feature called Event Hubs Archive!

Azure Event Hubs is a real-time, highly scalable, and fully managed data-stream ingestion service that can ingress millions of events per second and stream them through multiple applications. This lets you process and analyze massive amounts of data produced by your connected devices and applications.

Included in the many key scenarios for Event Hubs are long-term data archival and downstream micro-batch processing. Customers typically use compute or other homegrown solutions for archival or to prepare for batch processing tasks. These custom solutions involve significant overhead with regards to creating, scheduling and managing batch jobs. Why not have something out-of-the-box that solves this problem? Well, look no further – there’s now a great new feature called Event Hubs Archive!

Event Hubs Archive addresses these important requirements by archiving the data directly from Event Hubs to Azure storage as blobs. ‘Archive’ will manage all the compute and downstream processing required to pull data into Azure blob storage. This reduces your total cost of ownership, setup overhead, and management of custom jobs to do the same task, and lets you focus on your apps!

Benefits of Event Hub Archive

  1. Simple setup

    Extremely straightforward to configure your Event Hubs to take advantage of this feature.

  2. Reduced total cost of ownership

    Since Event Hubs handles all the management, there is minimal overhead involved in setting up your custom job processing mechanisms and tracking them.

  3. Cohesive with your Azure Storage

    By just choosing your Azure Storage account, Archive pulls the data from Event Hubs to your containers.

  4. Near-Real time batch analytics

    Archive data is available within minutes of ingress into Event Hubs. This enables most common scenarios of near-real time analytics without having to construct separate data pipelines.

A peek inside the Event Hubs Archive

Event Hubs Archive can be enabled in one of the following ways:

  1. With just a click on the new Azure portal on an Event Hub in your namespace

  2. Azure Resource Manager templates

Once the Archive is enabled for the Event Hub, you need to define the time and size windows for archiving.

The time window allows you to set the frequency with which the archival to Azure Blobs will happen. The frequency range is configurable from 60 – 900 seconds (1 – 15 minutes), both inclusive, with a granularity of 1 second. The default setting is 300 seconds (5 minutes).

The size window defines the amount of data built up in your Event Hub before an archival operation. The size range is configurable between 10MB – 500MB (10485760 – 524288000 bytes), both inclusive, at byte level granularity.

The archive operation will kick in when either the time or size window is exceeded. After time and size settings are set, the next step is configuring the destination which will be the storage account of your choosing.

That’s it! You’ll soon see blobs being created in the specified Azure Storage account’s container.

The blobs are created with the following naming convention:

/////

///

For example: Myehns/myhub/0/2016/07/20/09/02/15 and are in standard Avro format.

If there is no event data in the specified time and size window, empty blobs will be created by Archive.

Pricing

Archive will be an option when creating an Event Hub in a namespace and will be limited to one per Event Hub. This will be added to the Throughput Unit charge and thus will be based on the number of throughput units selected for the Event Hub.

Opting Archive will involve 100% egress of ingested data and the cost of storage is not included. This implies that cost is primarily for compute (hey, we are handling all this for you!).

Next Steps?

Learn all about this new feature here, Event Hubs Archive

Use templates to enable the feature on your Event Hub, Enable Archive using Azure Resource Manager

Check out the price details on Azure Event Hubs pricing.

Let us know what you think about newer sinks an newer serialization formats.

Start enjoying this feature, available today.

If you have any questions or suggestions, leave us a comment below.