Kafka on Ubuntu VMs

Azure Public Test Date Azure Public Test Result

Azure US Gov Last Test Date Azure US Gov Last Test Result

Best Practice Check Cred Scan Check

Deploy To Azure Deploy To Azure US Gov Visualize

Apache Kafka is publish-subscribe messaging rethought as a distributed commit log.

Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization. It can be elastically and transparently expanded without downtime. Data streams are partitioned and spread over a cluster of machines to allow data streams larger than the capability of any single machine and to allow clusters of co-ordinated consumers

Kafka has a modern cluster-centric design that offers strong durability and fault-tolerance guarantees.

This template deploys a Kafka cluster on the Ubuntu virtual machines. This template also provisions a storage account, virtual network, availability sets, public IP addresses and network interfaces required by the installation. The template also creates 1 publicly accessible VM acting as a "jumpbox" and allowing to ssh into the Kafka nodes for diagnostics or troubleshooting purposes. The template creates the following deployment resources:

  • Virtual Network with two subnets: "dmz 10.0.0.0/24" for the jumpbox VM, "zookeeper 10.0.1.0/24" and "data 10.0.2.0/24" for the Kafka Broker VMs
  • Storage accounts to store VM data disks
  • Public IP address for accessing the jumpbox via ssh
  • Network interface card for each VM
  • Multiple remotely-hosted Custom Script Extensions to strip the data disks and to install and configure Kafka services

Assuming your domainName parameter was "kafkajumpbox" and region was "West US"

  • Kafka servers will be deployed at IP address prefix in the subnet: 10.0.2.10,10.0.2.11,10.0.2.12, etc.
  • Zookeeper servers will be deployed in the other IP addresses: 10.0.1.10, 10.0.1.11, 10.0.1.12, etc.
  • From your computer, SSH into the jumpbox ssh kafkajumpbox.westus.cloudapp.azure.com
  • From the jumpbox, SSH into the Kafka server ssh 10.0.2.4

The following table outlines the deployment topology characteristics for each supported t-shirt size:

T-Shirt Size Database VM Size CPU Cores Memory Data Disks # of Brokers # of Zookeepers # of Storage Accounts
Small Standard_A1 1 1.75 GB 2x1023 GB 3 1 1
Medium Standard_A3 4 7 GB 8x1023 GB 5 3 2
Large Standard_A4 8 14 GB 16x1023 GB 5 3 3
XLarge Standard_A7 8 56 GB 16x1023 GB 8 5 4

How to Run the scripts

You can use the Deploy to Azure button or use the below methor with powershell

Creating a new deployment with powershell:

Remember to set your Username, Password and Unique Storage Account name in azuredeploy-parameters.json

Create a resource group:

PS C:\Users\azureuser1> New-AzureResourceGroup -Name "AZKFRKAFKAEA3" -Location 'EastAsia'

Start deployment

PS C:\Users\azureuser1> New-AzureResourceGroupDeployment -Name AZKFRGKAFKAV2DEP1 -ResourceGroupName "AZKFRGKAFKAEA3" -TemplateFile C:\gitsrc\azure-quickstart-templates\kafka-ubuntu-multidisks\azuredeploy.json -TemplateParameterFile C:\gitsrc\azure-quickstart-templates\kafka-ubuntu-multidisks\azuredeploy-parameters.json -Verbose

On successful deployment results will be like this

DeploymentName    : AZKFRGSPARKV2DEP1
ResourceGroupName : AZKFRGSPARKEA1
ProvisioningState : Succeeded
Timestamp         : 4/28/2015 9:11:19 PM
Mode              : Incremental
TemplateLink      :
Parameters        :

    Name             Type                       Value
    ===============  =========================  ==========
    region           String                     West US
    storageAccountNamePrefix  String                     cgnarmstrkafkav4
    domainName       String                     kafkacgnarmv4
    adminUsername    String                     adminuser
    adminPassword    SecureString
    tshirtSize       String                     Small
    jumpbox          String                     Enabled
    virtualNetworkName  String                     vnet

Check Deployment

To access the individual Kafka nodes, you need to use the publicly accessible jumpbox VM and ssh from it into the VM instances running Kafka.

To get started connect to the public ip of Jumpbox with username and password provided during deployment. From the jumpbox connect to any of the Kafka brokers eg: SSH into the Kafka server ssh 10.0.2.4 ,ssh 10.0.2.5, etc. Run the command ps-ef|grep kafka to check that kafka process is running ok. You can run the kafka commands like this:

cd /usr/local/kafka/kafka_2.10-0.8.2.1/

bin/kafka-topics.sh --create --zookeeper 10.0.1.10:2181  --replication-factor 2 --partitions 1 --topic my-replicated-topic1

bin/kafka-topics.sh --describe --zookeeper 10.0.1.10:2181  --topic my-replicated-topic1

Topology

The deployment topology is comprised of Kafka Brokers and Zookeeper nodes running in the cluster mode. Kafka version 0.8.2.1 is the default version and can be changed to any pre-built binaries avaiable on Kafka repo. A static IP address will be assigned to each Kafka node (by default, the first node will be assigned the private IP of 10.0.2.10, the second node - 10.0.2.11, and so on) A static IP address will be assigned to each Zookeeper node(by default, the first node will be assigned the private IP of 10.0.1.10, the second node - 10.0.1.11, and so on)

To check deployment errors go to the new azure portal and look under Resource Group -> Last deployment -> Check Operation Details

Known Issues and Limitations

  • Health monitoring of the Kafka instances is not currently enabled
  • SSH key is not yet implemented and the template currently takes a password for the admin user

Tags: Microsoft.Resources/deployments, Microsoft.Network/networkInterfaces, Microsoft.Compute/virtualMachines, Microsoft.Compute/virtualMachines/extensions, CustomScript, Microsoft.Network/publicIPAddresses, Microsoft.Storage/storageAccounts, Microsoft.Network/virtualNetworks, Microsoft.Compute/availabilitySets