Apache Storm for HDInsight

Real-time stream processing made easy for big data

What is Apache Storm?

Apache Storm is a distributed, fault-tolerant, open-source, real-time event processing solution for large, fast streams of data. First made famous by Twitter, which used the technology on its massive tweet streams, Storm is now a project of the Apache Software Foundation. The Azure cloud makes Apache Storm easy and cost-effective to deploy, with no hardware to buy, no software to configure, your choice of development tools (Java or C#), and deep integration to Visual Studio. Watch a quick overview.
Data comes in from various sources (applications, devices, sensors, web, social) and is collected in the cloud through web APIs or field gateways. The data is put into a queueing service like Event Hubs, Kafka, RabbitMQ, or ActiveMQ, for real-time data processing with Apache Storm on HDInsight. The data moves to long-term storage with Apache HBase on HDInsight, where you can run your real-time dashboards, queries, and analytics.

Real-time processing for real-time challenges

Today’s connected world is defined by big data that arrives in real-time. Storm is ideal for challenging real-time scenarios like fraud detection, click-stream analysis, financial alerts, telemetry from connected sensors and devices (IoT), social analytics, "always on" ETL pipelines, and network monitoring. Customers can source these real-time events from devices, sensors, infrastructure, applications, websites, and data.

Easy setup, fast results

With Storm for HDInsight, there’s no time-consuming installation or set up. Azure does it for you. You’ll be up and running in minutes, and can deploy Storm without buying new hardware or incurring other up-front costs.

Integrated development environment for easier and faster results

Storm is simple to use and supports any programming language—including Java and .NET. Built-in integration with the Visual Studio IDE means that you can develop, deploy, and debug Storm topologies quickly and easily. You can even mix spouts written in other languages, meaning that you can leverage the vast universe of existing spouts and bolts as part of your topology.

Elastic capacity for big data

Storm for HDInsight leverages the power of the Azure cloud, making it easier to create clusters of any size to process any amount of data on demand. We charge only for the compute and storage you actually use.

High availability for guaranteed business continuity

Storm is fault tolerant, and automatically restarts workers on other nodes in case of failure. Storm for HDInsight takes this a step further—guaranteeing 99.9% up time for your Storm clusters. Azure also offers 24x7 enterprise support and cluster monitoring.

Deploy your first Apache Storm analytics pipeline

Deploying an Apache Storm cluster and running your first real-time analytics pipeline can be done in minutes.

Use your Azure subscription or create a trial account to log on to the Azure portal.

Give a name to the Storm cluster, and pick the number of nodes to define the size of the cluster. You can deploy a Storm cluster from 1 node all the way to hundreds of nodes. We also allow you to scale up or scale down a running Storm cluster.

It usually takes 15 minutes to deploy a Storm cluster. Once it is deployed, click STORM DASHBOARD at the bottom of the page to deploy your first storm topology.

Provide the username and password that you chose when creating the cluster.

From the drop down, either pick one of the sample topologies, or you can upload a new topology, which should be compiled as a JAR file.

Click Submit to deploy the WorkCount topology. This topology counts the number of words that are present in a storm of sentences that are coming as input.

Once the submission is completed, you can click Storm UI to monitor the running topology.

It's easy to build, deploy and manage Storm topologies all from within the Visual Studio environment. Azure SDK also ships with easy-to-get-started templates for Storm on HDInsight. The Visual Studio integrated experience increases productivity and allows you to do full project management from within the Visual Studio environment.

Try HDInsight for free