Apache Kafka per HDInsight

Servizio gestito a bassa latenza e velocità effettiva elevata per dati in tempo reale

Kafka per HDInsight è un conveniente servizio open source di inserimento di streaming di livello aziendale, che garantisce facilità di configurazione, gestione e uso. Crea soluzioni in tempo reale come IoT, rilevamento delle frodi, analisi clickstream, avvisi finanziari e analisi di social media.

Managed Kafka with a 99.9% SLA

Purchasing the hardware, installing and tuning the bits requires a lot of time and effort. Ensuring that these machines are always up and running such that no data is lost is an even greater challenge and has a huge cost of ownership. Kafka for Azure HDInsight manages all of this for you. Through 4 clicks, Kafka clusters are up and running within minutes, with a 99.9% SLA on the Kafka uptime. This means that you can concentrate on writing realtime applications, their logic and building the higher level pipelines instead of worrying about installing new Kafka brokers or fixing broken ones.

Rack awareness for Azure Environments

Kafka was designed with a single dimensional view of a rack which works well on some environments. However on environments such as Azure, a rack is separated out into two dimensions - Update Domains (UDs) and Fault Domains (FDs). HDInsight Kafka has developed scalable and robust tools ensure Kafka is rack aware on the Azure environments. These tools rebalance the partitions and replicas across the UDs and FDs for the highest levels of Kafka availabilities across Azure Availability Zones.

Integration with Azure Managed Disks

Due to the ingestion heavy nature, the disks attached to the nodes on the cluster often result as the bottleneck. Traditionally, to scale this bottleneck, more nodes need to be added. Azure Managed Disks is a technology that provides cheaper, scalable disks that are a fraction of the cost of a node. HDInsight Kafka has integrated with these disks to provide upto 16 TB/node instead of the traditional 1 TB. This results in an exponentially higher scale, while reducing costs in the inverse, exponential manner. Our enterprise customers have been able to save thousands of dollars per month due to this innovation.

Out of the box alerting, monitoring and predictive maintenance

Getting a streaming pipeline up and running is just the start -- ensuring that it is performing reliably with no issues requires huge investments in monitoring and alerting infrastructures. Kafka for HDInsight takes away this problem as it is integrated with Azure’s monitoring suite out of the box. This technology allows you to monitor everything from VM level disk and NIC metrics to JMX metrics from Kafka, Storm and Spark. Not only can you create powerful alerting and monitoring dashboards, you can specify scripts and runbooks against these metrics for automated and predictive maintenance of your streaming pipeline.

MirrorMaker support for replicating Kafka data

Kafka is often deployed in multiple environments for Disaster Recovery, high availability, and on-prem to cloud hybrid scenarios. These require replication of data from one Kafka to the other. HDInsight has worked closely with enterprise customers to understand this need, and provides support for data replication scenarios. Mirroring on HDInsight Kafka is easy to setup and use.

Cluster scaling within minutes

Estimates for message sizes and messages/sec and streaming needs change as the pipeline is used. Traditionally, the peak traffic is what the cluster is sized for, which results in very high costs for unused capacity. When the time comes to add more nodes, the new machines need to be provisioned, installed, and configured with customizations reapplied. On HDInsight Kafka, start with small clusters and scale them up as needed, providing for exponentially lower costs. HDInsight takes care of provisioning the new nodes, with the customizations applied within minutes.

What can you build with Kafka for HDInsight?

Learn about use cases below:

I dati provengono da varie origini eventi (applicazioni, dispositivi, sensori, Web, social media) e vengono raccolti nel cloud tramite API o gateway sul campo. Il flusso di dati viene inserito da Kafka per HDInsight per consentirne l'elaborazione e l'analisi con servizi come Azure Machine Learning, Spark per HDInsight, Storm per HDInsight e adattatori di archiviazione. I dati passano all'archiviazione a lungo termine con servizi come Apache HBase in HDInsight, DocumentDB, MonoDB SQL, Solr Azure, Data Lake Store e Ricerca di Azure. Puoi quindi eseguire query, analisi e dashboard in tempo reale oppure inviare i dati ai dispositivi per un intervento.

Customers using Kafka for HDInsight

Office 365
Bing ads
Toyota Connected

"Toyota manufactures millions of cars running globally, and building a connected car platform to process real-time data at Toyota scale is a monumental challenge. To process events at Toyota's scale, technologies such as Kafka need to be leveraged. Since HDInsight is the only managed platform that provides Kafka as a managed service with a 99.9% SLA, Toyota was able to leverage the scalable technology of Kafka, Storm and Spark on Azure HDInsight. Using the HDInsight platform, we were able to deploy enterprise grade streaming pipelines to process events from millions of cars every second. This is just scratching the surface - the future of global connected cars on Azure HDInsight is bright, and we are excited for what's in store."

Vijay Chemuturi, Chief Product Owner, Toyota Connected

New to Kafka for HDInsight?

Use the links below to create robust, enterprise ready streaming pipelines using Kafka, Storm, and Spark Streaming on Azure.

Monitor realtime streaming pipelines with Azure

Learn how to use HDInsight Kafka's integration with Azure Monitoring to create powerful alerting and monitoring dashboards, and automated scripts and runbooks predictive maintenance of your streaming pipeline.

Prova Kafka per HDInsight