Questions? Feedback? powered by Olark live chat software
Gezintiyi Atla

HDInsight

A managed Apache Hadoop, Spark, R, HBase, and Storm cloud service made easy

Comprehensive set of managed Apache big data projects

Scale elastically on demand

Azure HDInsight is an Apache Hadoop distribution powered by the cloud. This means that it handles any amount of data, scaling from terabytes to petabytes on demand. Spin up any number of nodes at any time. We charge only for the compute and storage that you use.

Beth Israel Diyakoz Tıp Merkezi
It's part of our audit requirements that we keep data for seven years, and some information has to be retained for as long as 30 years. With HDInsight, we can store more data and query it as needed.

–Don Wood, Beth Israel Deaconess Medical Center

Azure HDInsight offers cloud services to handle any amount of data
Hadoop cloud services lets you analyze large sets of data easily

Crunch all data—structured, semi-structured, unstructured

Because it's 100 percent Apache Hadoop, HDInsight can process unstructured or semi-structured data from web clickstreams, social media, server logs, devices and sensors, and more. This lets you analyze new sets of data and uncover new business possibilities that drive your organization forward.

Ascribe
With a solution based on SQL Server and the Azure HDInsight service, we can capture data written in plain English and use it to improve services�This will reinvent the way we work with medical records in the future.

–Paul Henderson, Ascribe

Develop in your favorite language

HDInsight has powerful programming extensions for languages including C#, Java, and .NET. Use your programming language of choice on Hadoop to create, configure, submit, and monitor Hadoop jobs. See what else
Use your programming language of choice with the Hadoop cloud service
Azure HDInsight Hadoop cloud services is available in the cloud without other up-front costs

Skip the hardware purchase and maintenance

With HDInsight, deploy Hadoop in the cloud without buying new hardware or incurring other up-front costs. There’s also no time-consuming installation or set up. Azure does it for you. Launch your first cluster in minutes.

McKesson
Because we're on an elastic cloud with Azure, we don't have to worry about setting up infrastructure or whether we can sustain growth with the current capacity in our data centers.

–Sujatha Bayyapureddy, McKesson

Use Excel or your favorite BI tool to visualize Hadoop data

Because it's integrated with Excel, HDInsight lets you visualize and analyze your Hadoop data in compelling new ways using a tool that's familiar to your business users. From Excel, users can select HDInsight as a data source.

Black Ball
I looked at some of the other BI solutions on the market, and most were overly complex, especially from an end-user point of view.

–Andrew Cheong, BlackBall

Use excel to visualize all your Hadoop data
Use the cloud to connect on-premises Hadoop clusters

Connect on-premises Hadoop clusters with the cloud

HDInsight is also integrated with Hortonworks Data Platform, letting you move Hadoop data from an on-site datacenter to the Azure cloud for backup, Dev/Test, and cloud-bursting scenarios. Using the Microsoft Analytics Platform System, you can even query your on-premises and cloud-based Hadoop clusters at the same time.

Customize clusters to run other Hadoop projects

The Apache Hadoop ecosystem is a portfolio of fast-moving open-source projects that are evolving quickly. HDInsight gives you the flexibility to deploy arbitrary Hadoop projects through custom scripts. This includes popular projects like Spark, R, Giraph, and Solr.

Use NoSQL transactional capabilities offered by Azure

Use NoSQL transactional capabilities

HDInsight also includes Apache HBase, a columnar NoSQL database that runs on top of the Hadoop Distributed File System (HDFS). This lets you do large transactional processing (OLTP) of non-relational data, enabling use cases like interactive websites or having sensor data write to Azure Blob Storage.

Provide real-time stream processing

HDInsight includes Apache Storm, an open-source stream analytics platform that can process real-time events at large scale. This lets you process millions of events as they’re generated, enabling use cases like Internet of Things (IoT) and gaining insights from your connected devices or web-triggered events. We make deploying and implementing Storm easier. Learn more about Storm

Use Spark for interactive analysis

HDInsight includes Apache Spark, an open-source project in the Apache ecosystem that can run large-scale data analytics applications in memory. Spark delivers queries up to 100x faster than traditional big data queries. It provides a common execution model for tasks like ETL, batch queries, interactive queries, real-time streaming, machine learning, and graph processing on data stored in Azure Storage. Learn more about Spark

Use R to support predictive modeling and machine learning

HDInsight incorporates R Server for Hadoop, a scale out implementation of one of the most popular programming language for statistical computing and machine learning. R Server on HDInsight is a cloud implementation of 100 percent open-source R integrated with Hadoop and Spark clusters. It gives the familiarity of R with the scalability and performance of Hadoop. Learn more about R Server for HDInsight

Deploy to Windows and Linux

Select Linux or Windows clusters when deploying big data workloads into Azure. With Windows, use existing Windows-based code, including .NET, to scale over all of your data in Azure. With Linux, you can more easily move existing Hadoop workloads into the cloud and incorporate additional big data components that can run in the service. By offering both Windows and Linux clusters, Microsoft gives you the flexibility to use the operating system of your choice, gaining insights from the massive amounts of data being created in the cloud.

*Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation.

Customers building Hadoop in Azure