Skip Navigation

Azure HDInsight

Enterprise-ready, managed cluster service for open-source analytics

Manage your big data needs in an open-source platform

Run popular open-source frameworks—including Apache Hadoop, Spark, Hive, Kafka, and more—using Azure HDInsight, a customizable, enterprise-grade service for open-source analytics. Effortlessly process massive amounts of data and get all the benefits of the broad open-source project ecosystem with the global scale of Azure. Easily migrate your big data workloads and processing to the cloud.

Open-source projects and clusters are easy to spin up quickly without the need to install hardware or manage infrastructure

Big data clusters reduce costs through autoscaling and pricing tiers that allow you to pay for only what you use

Enterprise-grade security and industry-leading compliance with more than 30 certifications helps protect your data

Optimized components for open-source technologies such as Hadoop and Spark keep you up to date

Build your projects in an open-source ecosystem

Stay up to date with the newest releases of open source frameworks, including Kafka, HBase, and Hive LLAP. HDInsight supports the latest open-source projects from the Apache Hadoop and Spark ecosystems.

Integrate natively with Azure services

Build your data lake through seamless integration with Azure data storage solutions and services including Azure Synapse Analytics, Azure Cosmos DB, Azure Data Lake Storage, Azure Blob Storage, Azure Event Hubs, and Azure Data Factory. Control costs by choosing from a wide variety of virtual machines and by leveraging load- or schedule-based autoscaling features. Monitor your entire data lake using Azure Monitor dashboards.

Get the flexibility of multiple languages and tools

Use your preferred productivity tools, including Visual Studio, Eclipse, IntelliJ, Jupyter, and Zeppelin. Write code in familiar languages such as Scala, Python, R, JavaScript, and .NET.

End-to-end security for analytics workloads

  • Secure your cluster with virtual network isolation and control outbound traffic using Azure Firewall and VNet.
  • Sign in using your corporate domain credentials with Azure Active Directory (Azure AD) and multifactor authentication.
  • Enforce fine-grained authorization policies using Apache Ranger. Enjoy the benefits of data masking and row-level filtering.
  • Use your own encryption keys to protect end-to-end data with encryption in transit.

Pay for only what you need

HDInsight offers a broad range of memory- or compute-optimized platforms (virtual machines). Choose the one that best suits your performance and cost requirements.

Trusted by companies of all sizes

Myntra accelerates its digital transformation

Myntra has worked closely with Microsoft to migrate its platform—from supply chain management to inventory to site capabilities to Azure for trusted, always-on, hyperscale and cost-effective computing.

Myntra

Gap Inc. accelerates its digital transformation

By building and centralizing its data platform on Azure, Gap Inc. can now apply advanced analytics and machine learning to gain a comprehensive understanding of customers across channels in all brands in its portfolio.

GAP

Azure HDInsight updates, blogs, and announcements

Frequently asked questions about HDInsight

  • You would benefit from Azure HDInsight if you use custom code to process and analyze extremely large datasets with the latest big data processing frameworks such as Spark, Hadoop, Hive, Kafka or Hbase. Azure HDInsight gives you full control over the configuration of your clusters and the software installed on them. You might also consider HDInsight if you are migrating Hortonworks, Cloudera, or MapR clusters from on-premises environments or other clouds.
  • Azure HDInsight can be used for a variety of scenarios in big data processing. It can be historical data (data that's already collected and stored) or real-time data (data that's directly streamed from the source). The scenarios for processing such data can be summarized in the following categories: batch processing (ETL), data warehousing, Internet of Things (IoT), data science, and hybrid.
  • To learn more about HDInsight clusters types and provisioning methods, read our documentation about how to set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more.

Ready when you are—let's set up your Azure free account