Skip Navigation

Bring Your Own Keys for Apache Kafka on HDInsight

Posted on October 3, 2018

Program Manager, Azure Big Data

One of the biggest security and compliance requirements for enterprise customers is to encrypt their data at rest using their own encryption key. This is even more critical in a post-GDPR world. Today, we’re announcing the public preview of Bring Your Own Key (BYOK) for data at rest in Apache Kafka on Azure HDInsight.

Azure HDInsight clusters already provide several levels of security. At the perimeter level, traffic can be controlled via Virtual Networks and Network Security Groups. Kerberos authentication and Apache Ranger provide the ability to finely control access to Kafka topics. Further, all managed disks are protected via Azure Storage Service Encryption (SSE). However, for some customers it is vital that they own and manage the keys used to encrypt the data at rest. Some customers achieve this by encrypting all Kafka messages in their producer applications and decrypting them in their consumer applications. This process is cumbersome and involves custom logic. Moreover, it doesn’t allow for usage of community supported connectors.

With HDInsight Kafka’s support for Bring Your Own Key (BYOK), encryption at rest is a one step process handled during cluster creation. Customers should use a user-assigned managed identity with the Azure Key Vault (AKV) to achieve this. AKV provides a highly available, scalable, and secure storage for cryptographic keys.

The data engineer authorizes the managed identity to have read access from AKV and then enables BYOK in HDInsight by providing the Azure Key Vault URL associated with the encryption key.

All messages to the Kafka cluster including replicas maintained by Kafka, are stored in Azure Managed Disks. With BYOK turned on, the attached Managed Disks are encrypted with a symmetric Data Encryption Key (DEK), which in turn is protected using the Key Encryption Key (KEK) from the customer’s key vault. The encryption and decryption processes are entirely handled by HDInsight. This setup is transparent to the customer; Kafka clients (producer and consumer applications) need not be modified. The cluster or key vault admin can safely rotate the keys in the key vault via the Azure portal or Azure CLI and the HDInsight Kafka cluster will start using the new key within minutes.

Customers must enable Soft-delete for customer managed keys that help protect them against ransomware scenarios and accidental deletion.

With BYOK on HDInsight Kafka, enterprise customers can now be more confident than ever in the security of their cluster. This feature unlocks Kafka for customers for whom BYOK is a prerequisite for data at rest. There is no additional charge for enabling this feature.

To get started with BYOK on HDInsight Kafka, please refer to the following documentation:

Follow us on @AzureHDInsight or HDInsight blog for the latest updates. For questions and feedback, reach out to AskHDInsight@microsoft.com.

About Azure HDInsight

Azure HDInsight is an easy, cost-effective, enterprise-grade service for open source analytics that enables customers to easily run popular open source frameworks including Apache Hadoop, Spark, Kafka, and others. The service is available in 27 public regions and Azure Government Clouds in the US and Germany. Azure HDInsight powers mission critical applications in a wide variety of sectors and enables a wide range of use cases including ETL, streaming, and interactive querying.