Provision cloud Hadoop, Spark, R Server, HBase, and Storm clusters
Manage your big data needs in an open-source platform
Run popular open-source frameworks – including Apache Hadoop, Spark, Hive, Kafka and more – using Azure HDInsight, a customisable, enterprise-grade service for open-source analytics. Effortlessly process massive amounts of data and get all the benefits of the broad open-source project ecosystem with the global scale of Azure. Easily migrate your big data workloads and processing to the cloud.
Open-source projects and clusters are easy to spin up quickly without the need to install hardware or manage infrastructure
Big data clusters reduce costs through auto-scaling and pricing tiers that allow you to only pay for what you use
Enterprise-grade security and industry-leading compliance with more than 30 certifications helps protect your data
Optimised components for open-source technologies such as Hadoop and Spark keep you up to date
Build your projects in an open-source ecosystem
Stay up to date with the newest releases of open source frameworks, including Kafka, HBase and Hive LLAP. HDInsight supports the latest open-source projects from the Apache Hadoop and Spark ecosystems.
Integrate natively with Azure services
Build your data lake through seamless integration with Azure data storage solutions and services including Azure Synapse Analytics, Azure Cosmos DB, Azure Data Lake Storage, Azure Blob Storage, Azure Event Hubs and Azure Data Factory. Control costs by choosing from a wide variety of virtual machines and by leveraging load- or schedule-based auto-scaling features. Monitor your entire data lake using Azure Monitor dashboards.
Get the flexibility of multiple languages and tools
End-to-end security for analytics workloads
- Secure your cluster with virtual network isolation and control outbound traffic using Azure Firewall and VNet.
- Sign in using your corporate domain credentials with Azure Active Directory (Azure AD) and multi-factor authentication.
- Enforce fine-grained authorisation policies using Apache Ranger. Enjoy the benefits of data masking and row-level filtering.
- Use your own encryption keys to protect end-to-end data with encryption in transit.
Only pay for what you need
HDInsight offers a broad range of memory- or compute-optimised platforms (virtual machines). Choose the one that best suits your performance and cost requirements.
HDInsight resources and documentation
Get started with learning resources
Explore popular developer resources
Trusted by companies of all sizes
Myntra accelerates its digital transformation
Myntra has worked closely with Microsoft to migrate its platform – from supply chain management to inventory to site capabilities to Azure for trusted, always-on, hyperscale and cost-effective computing.
Gap Inc. accelerates its digital transformation
By building and centralising its data platform on Azure, Gap Inc. can now apply advanced analytics and machine learning to gain a comprehensive understanding of customers across channels in all brands in its portfolio.
Frequently asked questions about HDInsight
You would benefit from Azure HDInsight if you use custom code to process and analyse extremely large data sets with the latest big data processing frameworks such as Spark, Hadoop, Hive, Kafka or Hbase. Azure HDInsight gives you full control over the configuration of your clusters and the software installed on them. You might also consider HDInsight if you are migrating Hortonworks, Cloudera or MapR clusters from on-premises environments or other clouds.
Azure HDInsight can be used for a variety of scenarios in big data processing. It can be historical data (data that’s already collected and stored) or real-time data (data that’s directly streamed from the source). The scenarios for processing such data can be summarised in the following categories: batch processing (ETL), data warehousing, Internet of Things (IoT), data science and hybrid.
To learn more about HDInsight cluster types and provisioning methods, read our documentation about how to set up clusters in HDInsight with Apache Hadoop, Apache Spark, Apache Kafka and more.