Scale elastically on demand
HDInsight is a Hadoop distribution powered by the cloud. This means HDInsight was architected to handle any amount of data, scaling from terabytes to petabytes on demand. You can spin up any number of nodes at anytime. We charge only for the compute and storage you actually use.
It's part of our audit requirements that we keep data for seven years, and some information has to be retained for as long as 30 years. With HDInsight, we can store more data and query it as needed.
–Don Wood, Beth Israel Deaconess Medical Center
Crunch all data – structured,
semi-structured, unstructured
Since it's 100% Apache Hadoop, HDInsight can process unstructured or semi-structured data from web clickstreams, social media, server logs, devices and sensors, and more. This allows you to analyze new sets of data which uncovers new business possibilities to drive your organization forward.
With a solution based on SQL Server and Azure HDInsight Service, we can capture data written in plain English and use it to improve services…This will reinvent the way we work with medical records in the future.
–Paul Henderson, Ascribe
Develop in your favorite language
HDInsight has powerful programming extensions for languages including, C#, Java, .NET, and more. You can use your programming language of choice on Hadoop for the creation, configuration, submission, and monitoring of Hadoop jobs. See what else
No hardware to acquire or maintain
With HDInsight, you can deploy Hadoop in the cloud without buying new hardware or other up-front costs. There’s also no time-consuming installation or set up. Azure does it for you. You can launch your first cluster in minutes.
Because we're on an elastic cloud with Windows Azure, we don’t have to worry about setting up infrastructure or whether we can sustain growth with the current capacity in our data centers.
–Sujatha Bayyapureddy, McKesson
Use Excel to visualize your Hadoop data
Because it's integrated with Excel, HDInsight lets you visualize and analyze your Hadoop data in compelling new ways in a tool familiar to your business users. From Excel, users can select Azure HDInsight as a data source.
I looked at some of the other BI solutions on the market, and most were overly complex, especially from an end-user point of view.
–Andrew Cheong, BlackBall
Connect on-premises Hadoop clusters with the cloud
HDInsight is also integrated with Hortonworks Data Platform, so you can move Hadoop data from an on-site datacenter to the Azure cloud for backup, dev/test, and cloud bursting scenarios. Using the Microsoft Analytics Platform System, you can even query your on-premises and cloud-based Hadoop clusters at the same time.
Customize clusters to run other Hadoop projects
This Hadoop ecosystem is a portfolio of fast-moving open source projects that are evolving quickly. To give customers flexibility, HDInsight has the option to deploy arbitrary Hadoop projects through custom scripts. This includes popular projects like Spark, R, Giraph and Solr.
Includes NoSQL transactional capabilities
HDInsight also includes Apache HBase, a columnar NoSQL database that runs on top of the Hadoop Distributed File System (HDFS). This allows you to do large transactional processing (OLTP) of nonrelational data enabling use cases like having interactive websites or sensor data write to Azure Blob storage.
Provide real-time stream processing
HDInsight includes Apache Storm, an open-source stream analytics platform that can process real-time events at large scale. This allows you to do processing on millions of events as they are generated enabling use cases like Internet of Things (IoT) and gaining insights from your connected devices or web-triggered events. We make deploying and implementing Storm easier. See more details about Storm here.
Deploy to Windows and Linux
Select Linux or Windows clusters when deploying Big Data workloads into Microsoft Azure. With Windows, leverage existing Windows based code, including .NET, to scale over all of your data in Azure. With Linux, customers can more easily move existing Hadoop workloads into the cloud and incorporate additional Big Data components which can run in the service. By offering choice for Windows and Linux clusters, Microsoft is enhancing flexibility for customers to create insight from the massive amounts of data being created in the cloud with the OS of their choice.