HDInsight용 Apache Kafka

실시간 데이터에 대한 처리량이 많고 대기 시간이 짧은 관리 서비스

HDInsight용 Kafka는 오픈 소스로 된 엔터프라이즈급 스트리밍 수집 서비스로, 비용 효율적으로 쉽게 설치, 관리 및 사용할 수 있습니다. IoT(사물 인터넷), 사기 감지, 클릭 스트림 분석, 재무 알림, 소셜 분석 등 실시간 솔루션을 구축하세요.

Managed Kafka with a 99.9% SLA

Purchasing the hardware, installing and tuning the bits requires a lot of time and effort. Ensuring that these machines are always up and running such that no data is lost is an even greater challenge and has a huge cost of ownership. Kafka for Azure HDInsight manages all of this for you. Through 4 clicks, Kafka clusters are up and running within minutes, with a 99.9% SLA on the Kafka uptime. This means that you can concentrate on writing realtime applications, their logic and building the higher level pipelines instead of worrying about installing new Kafka brokers or fixing broken ones.

Rack awareness for Azure Environments

Kafka was designed with a single dimensional view of a rack which works well on some environments. However on environments such as Azure, a rack is separated out into two dimensions - Update Domains (UDs) and Fault Domains (FDs). HDInsight Kafka has developed scalable and robust tools ensure Kafka is rack aware on the Azure environments. These tools rebalance the partitions and replicas across the UDs and FDs for the highest levels of Kafka availabilities across Azure Availability Zones.

Integration with Azure Managed Disks

Due to the ingestion heavy nature, the disks attached to the nodes on the cluster often result as the bottleneck. Traditionally, to scale this bottleneck, more nodes need to be added. Azure Managed Disks is a technology that provides cheaper, scalable disks that are a fraction of the cost of a node. HDInsight Kafka has integrated with these disks to provide upto 16 TB/node instead of the traditional 1 TB. This results in an exponentially higher scale, while reducing costs in the inverse, exponential manner. Our enterprise customers have been able to save thousands of dollars per month due to this innovation.

Out of the box alerting, monitoring and predictive maintenance

Getting a streaming pipeline up and running is just the start -- ensuring that it is performing reliably with no issues requires huge investments in monitoring and alerting infrastructures. Kafka for HDInsight takes away this problem as it is integrated with Azure’s monitoring suite out of the box. This technology allows you to monitor everything from VM level disk and NIC metrics to JMX metrics from Kafka, Storm and Spark. Not only can you create powerful alerting and monitoring dashboards, you can specify scripts and runbooks against these metrics for automated and predictive maintenance of your streaming pipeline.

MirrorMaker support for replicating Kafka data

Kafka is often deployed in multiple environments for Disaster Recovery, high availability, and on-prem to cloud hybrid scenarios. These require replication of data from one Kafka to the other. HDInsight has worked closely with enterprise customers to understand this need, and provides support for data replication scenarios. Mirroring on HDInsight Kafka is easy to setup and use.

Cluster scaling within minutes

Estimates for message sizes and messages/sec and streaming needs change as the pipeline is used. Traditionally, the peak traffic is what the cluster is sized for, which results in very high costs for unused capacity. When the time comes to add more nodes, the new machines need to be provisioned, installed, and configured with customizations reapplied. On HDInsight Kafka, start with small clusters and scale them up as needed, providing for exponentially lower costs. HDInsight takes care of provisioning the new nodes, with the customizations applied within minutes.

What can you build with Kafka for HDInsight?

Learn about use cases below:

데이터는 다양한 이벤트 소스(응용 프로그램, 장치, 센서, 웹, 소셜)에서 오며 Web API 또는 필드 게이트웨이를 통해 클라우드에서 수집됩니다. 데이터 스트림은 Azure Machine Learning, HDInsight용 Spark, HDInsight용 Storm 및 저장소 어댑터와 같은 서비스를 사용한 처리 및 분석을 위해 HDInsight용 Kafka에서 수집됩니다. 데이터는 HDInsight의 Apache HBase, DocumentDB, MonoDB SQL, Solr Azure, Data Lake Store 및 Azure Search와 같은 서비스를 사용하여 장기 저장소로 이동됩니다. 그런 다음 실시간 대시보드, 쿼리 및 분석을 실행하거나 데이터를 장치로 보내 작업을 수행할 수 있습니다.

Customers using Kafka for HDInsight

Office 365
Bing ads
Toyota Connected

"Toyota manufactures millions of cars running globally, and building a connected car platform to process real-time data at Toyota scale is a monumental challenge. To process events at Toyota's scale, technologies such as Kafka need to be leveraged. Since HDInsight is the only managed platform that provides Kafka as a managed service with a 99.9% SLA, Toyota was able to leverage the scalable technology of Kafka, Storm and Spark on Azure HDInsight. Using the HDInsight platform, we were able to deploy enterprise grade streaming pipelines to process events from millions of cars every second. This is just scratching the surface - the future of global connected cars on Azure HDInsight is bright, and we are excited for what's in store."

Vijay Chemuturi, Chief Product Owner, Toyota Connected

