Azure HDInsight is an Apache Hadoop distribution powered by the cloud. This means that it handles any amount of data, scaling from terabytes to petabytes on demand. Spin up any number of nodes at any time. We only charge for the compute and storage that you use.
Because it’s 100 per cent Apache Hadoop, HDInsight can process unstructured or semi-structured data from web clickstreams, social media, server logs, devices and sensors and more. This lets you analyse new sets of data and uncover new business possibilities that drive your organisation forwards.
With HDInsight, deploy Hadoop in the cloud without buying new hardware or incurring other up-front costs. There’s also no time-consuming installation or setup. Azure does it for you. Launch your first cluster in minutes.
Because it’s integrated with Excel, HDInsight lets you visualise and analyse your Hadoop data in compelling new ways, using a tool that’s familiar to your business users. From Excel, users can select HDInsight as a data source.
HDInsight is also integrated with Hortonworks Data Platform, letting you move Hadoop data from an on-site data centre to the Azure cloud for backup, Dev/Test and cloud-bursting scenarios. Using the Microsoft Analytics Platform System, you can even query your on-premises and cloud-based Hadoop clusters at the same time.
The Apache Hadoop ecosystem is a portfolio of fast-moving open-source projects that are evolving quickly. HDInsight gives you the flexibility to deploy arbitrary Hadoop projects through custom scripts. This includes popular projects such as Spark, R, Giraph and Solr.
HDInsight also includes Apache HBase, a columnar NoSQL database that runs on top of the Hadoop Distributed File System (HDFS). This lets you do large transactional processing (OLTP) of non-relational data, enabling use cases such as interactive websites or having sensor data write to Azure Blob Storage.
HDInsight includes Apache Storm, an open-source stream analytics platform that can process real-time events at large scale. This lets you process millions of events as they’re generated, enabling use cases such as Internet of Things (IoT) and gaining insights from your connected devices or web-triggered events. We make deploying and implementing Storm easier. Learn more about Storm
HDInsight includes Apache Spark, an open-source project in the Apache ecosystem that can run large-scale data analytics applications in-memory. Spark delivers queries up to 100 times faster than traditional big data queries. It provides a common execution model for tasks such as ETL, batch queries, interactive queries, real-time streaming, machine learning and graph processing on data stored in Azure Storage. Learn more about Spark
Select Linux or Windows clusters when deploying big data workloads into Azure. With Windows, use existing Windows-based code, including .NET, to scale over all your data in Azure. With Linux, you can more easily move existing Hadoop workloads into the cloud and incorporate additional big data components that can run in the service. By offering both Windows and Linux clusters, Microsoft gives you the flexibility to use the operating system of your choice, gaining insights from the massive amounts of data being created in the cloud.