Enterprise-ready, managed cluster service for open-source analytics
Manage your big data needs in an open-source platform
Run popular open-source frameworks—including Apache Hadoop, Spark, Hive, Kafka, and more—using Azure HDInsight, a customizable, enterprise-grade service for open-source analytics. Effortlessly process massive amounts of data and get all the benefits of the broad open-source project ecosystem with the global scale of Azure. Easily migrate your big data workloads and processing to the cloud.
Open-source projects and clusters are easy to spin up quickly without the need to install hardware or manage infrastructure
Big data clusters reduce costs through autoscaling and pricing tiers that allow you to pay for only what you use
Enterprise-grade security and industry-leading compliance with more than 30 certifications helps protect your data
Optimized components for open-source technologies such as Hadoop and Spark keep you up to date
Build your projects in an open-source ecosystem
Stay up to date with the newest releases of open source frameworks, including Kafka, HBase, and Hive LLAP. HDInsight supports the latest open-source projects from the Apache Hadoop and Spark ecosystems.
Integrate natively with Azure services
Build your data lake through seamless integration with Azure data storage solutions and services including Azure Synapse Analytics, Azure Cosmos DB, Azure Data Lake Storage, Azure Blob Storage, Azure Event Hubs, and Azure Data Factory. Control costs by choosing from a wide variety of virtual machines and by leveraging load- or schedule-based autoscaling features. Monitor your entire data lake using Azure Monitor dashboards.
Get the flexibility of multiple languages and tools
End-to-end security for analytics workloads
- Secure your cluster with virtual network isolation and control outbound traffic using Azure Firewall and VNet.
- Sign in using your corporate domain credentials with Azure Active Directory (Azure AD) and multifactor authentication.
- Enforce fine-grained authorization policies using Apache Ranger. Enjoy the benefits of data masking and row-level filtering.
- Use your own encryption keys to protect end-to-end data with encryption in transit.
Pay for only what you need
HDInsight offers a broad range of memory- or compute-optimized platforms (virtual machines). Choose the one that best suits your performance and cost requirements.
HDInsight resources and documentation
Get started with learning resources
Explore popular developer resources
Trusted by companies of all sizes
Myntra accelerates its digital transformation
Myntra has worked closely with Microsoft to migrate its platform—from supply chain management to inventory to site capabilities to Azure for trusted, always-on, hyperscale and cost-effective computing.
Gap Inc. accelerates its digital transformation
By building and centralizing its data platform on Azure, Gap Inc. can now apply advanced analytics and machine learning to gain a comprehensive understanding of customers across channels in all brands in its portfolio.
Azure HDInsight updates, blogs, and announcements
General availability: Azure HDInsight extends capabilities for encryption of data in transit and at rest
HDInsight HBase Accelerated Writes with Premium Data Lake Storage Gen2 is now generally available
Azure HDInsight ID Broker (HIB) is now generally available
Azure HDInsight now supports Private Link in preview
New region added to Azure HDInsight
Azure HDInsight --Autoscale for Interactive Query with HDInsight 4.0 is now generally available
Azure HDInsight now supports virtual network service endpoint policies
Frequently asked questions about HDInsight
You would benefit from Azure HDInsight if you use custom code to process and analyze extremely large datasets with the latest big data processing frameworks such as Spark, Hadoop, Hive, Kafka or Hbase. Azure HDInsight gives you full control over the configuration of your clusters and the software installed on them. You might also consider HDInsight if you are migrating Hortonworks, Cloudera, or MapR clusters from on-premises environments or other clouds.
Azure HDInsight can be used for a variety of scenarios in big data processing. It can be historical data (data that's already collected and stored) or real-time data (data that's directly streamed from the source). The scenarios for processing such data can be summarized in the following categories: batch processing (ETL), data warehousing, Internet of Things (IoT), data science, and hybrid.
To learn more about HDInsight clusters types and provisioning methods, read our documentation about how to set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka, and more.