Azure HDInsight is an Apache Hadoop distribution powered by the cloud. This means that it handles any amount of data, scaling from terabytes to petabytes on demand. Spin up any number of nodes at any time. We charge only for the compute and storage that you use.
Because it's 100 percent Apache Hadoop, HDInsight can process unstructured or semi-structured data from web clickstreams, social media, server logs, devices and sensors, and more. This lets you analyze new sets of data and uncover new business possibilities that drive your organization forward.
With HDInsight, deploy Hadoop in the cloud without buying new hardware or incurring other up-front costs. There’s also no time-consuming installation or set up. Azure does it for you. Launch your first cluster in minutes.
Because it's integrated with Excel, HDInsight lets you visualize and analyze your Hadoop data in compelling new ways using a tool that's familiar to your business users. From Excel, users can select HDInsight as a data source.
HDInsight is also integrated with Hortonworks Data Platform, letting you move Hadoop data from an on-site datacenter to the Azure cloud for backup, Dev/Test, and cloud-bursting scenarios. Using the Microsoft Analytics Platform System, you can even query your on-premises and cloud-based Hadoop clusters at the same time.
The Apache Hadoop ecosystem is a portfolio of fast-moving open-source projects that are evolving quickly. HDInsight gives you the flexibility to deploy arbitrary Hadoop projects through custom scripts. This includes popular projects like Spark, R, Giraph, and Solr.
HDInsight also includes Apache HBase, a columnar NoSQL database that runs on top of the Hadoop Distributed File System (HDFS). This lets you do large transactional processing (OLTP) of non-relational data, enabling use cases like interactive websites or having sensor data write to Azure Blob Storage.
HDInsight includes Apache Storm, an open-source stream analytics platform that can process real-time events at large scale. This lets you process millions of events as they’re generated, enabling use cases like Internet of Things (IoT) and gaining insights from your connected devices or web-triggered events. We make deploying and implementing Storm easier. Learn more about Storm
HDInsight includes Apache Spark, an open-source project in the Apache ecosystem that can run large-scale data analytics applications in memory. Spark delivers queries up to 100x faster than traditional big data queries. It provides a common execution model for tasks like ETL, batch queries, interactive queries, real-time streaming, machine learning, and graph processing on data stored in Azure Storage. Learn more about Spark
Select Linux or Windows clusters when deploying big data workloads into Azure. With Windows, use existing Windows-based code, including .NET, to scale over all of your data in Azure. With Linux, you can more easily move existing Hadoop workloads into the cloud and incorporate additional big data components that can run in the service. By offering both Windows and Linux clusters, Microsoft gives you the flexibility to use the operating system of your choice, gaining insights from the massive amounts of data being created in the cloud.