Apache Spark 2.4 and Apache Kafka 2.1 support on Azure HDInsight

Posted on Monday, July 8, 2019

Azure HDInsight now supports Apache Spark 2.4 and Apache Kafka 2.1. You can choose the Spark or the Kafka version that you want during cluster creation in the Azure portal. Both updates come with several new features, hundreds of bug fixes and improvements.

Spark 2.4 allows eager evaluation of Dataframes in notebooks; supports Barrier execution mode for better integration with deep learning frameworks; flexible streaming sinks to enable use of existing batch connectors; upgraded Kafka client (from 0.10 to 2.0); built-in Higher order functions; and Apache Avro data source. For a complete list of updates, check out the release notes of Apache Spark 2.4.

By switching to Kafka 2.1 from the previous version (1.1) on HDInsight, customers will get better broker resiliency due to an improved replication protocol; new functionality in the KafkaAdminClient api; configurable quota management; and support for Zstandard compression. For a complete list of updates, check out the release notes of Kafka 2.0 and Kafka 2.1.

To get started with Azure HDInsight, see our documentation. Follow us on @AzureHDInsight or HDInsight blog for the latest updates. For questions and feedback, reach out to AskHDInsight@microsoft.com.

  • HDInsight
  • Features