hdinsight-java-storm-eventhub

Larry Franks tarafından
Son güncelleştirme tarihi: 13.7.2017
GitHub üzerinde düzenleyin

An example of how to read and write from Azure Event Hub using an Apache Storm topology (written in Java,) on an Azure HDInsight cluster.

Prerequisites

How it works

The resources/writer.yaml topology writes random data to an Azure Event Hub. The data is generated by the DeviceSpout component, and is a random device ID and device value. So it's simulating some hardware that emits a string ID and a numeric value.

Thee resources/reader.yaml topology reads data from Event Hub (the data written by EventHubWriter,) parses the JSON data, and then logs the deviceId and deviceValue data.

The data format in Event Hub is a JSON document with the following format:

{ "deviceId": "unique identifier", "deviceValue": some value }

The reason it's stored in JSON is compatibility - I ran into someone who wasn't formatting data sent to Event Hub as JSON (from a Java application,) and was reading it into a Java app. Worked fine. Then they wanted to replace the reading component with a C# application that expected JSON. Problem! Always store to a nice format that is future proofed in case your components change.

Required information

  • An Azure Event Hub with two shared access policies; one that has listen permissions, and one that has write permissions. I will refer to these as "reader" and "writer", which is what I named mine

    • The policy keys for the "reader" and "writer" policies
    • The name of your Event Hub
    • The Service Bus namespace that your Event Hub was created in
    • The number of partitions available with your Event Hub configuration

    For information on creating an Event Hub, see the Create an Event Hubs namespace and event hub document.

Confgure and build

  1. Fork & clone the repository so you have a local copy.

  2. Add the Event Hub configuration to the dev.properties file. This is used to configure the spout that reads from Event Hub and the bolt that writes to it.

  3. Use mvn package to build everything.

    Once the build completes, the target directory will contain a file named EventHubExample-1.0-SNAPSHOT.jar.

Test locally

Since these topologies just read and write to Event Hubs, you can test them locally if you have a Storm development environment. Use the following steps to run locally in the dev environment:

  1. Run the writer:

    storm jar EventHubExample-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --local -R /writer.yaml --filter dev.properties
    
  2. Run the reader:

    storm jar EventHubExample-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --remote -R /reader.yaml --filter dev.properties
    

Output is logged to the console when running locally. Use Ctrl+C to stop the topology.

Deploy

  1. Use SCP to copy the jar package to your HDInsight cluster. Replace USERNAME with the SSH user for your cluster. Replace CLUSTERNAME with the name of your HDInsight cluster:

    scp ./target/EventHubExample-1.0-SNAPSHOT.jar USERNAME@CLUSTERNAME-ssh.azurehdinsight.net:EventHubExample-1.0-SNAPSHOT.jar
    

    If you used a password for your SSH account, you will be prompted to enter the password. If you used an SSH key with the account, you may need to use the -i parameter to specify the path to the key file. For example, scp -i ~/.ssh/id_rsa ./target/EventHubExample-1.0-SNAPSHOT.jar USERNAME@CLUSTERNAME-ssh.azurehdinsight.net:EventHubExample-1.0-SNAPSHOT.jar.

    If your client is a Windows workstation, you may not have an SCP command installed. You can get it by installing Bash for Windows 10 or using PSCP. PSCP can be downloaded from the PuTTY download page.

    This command will copy the file to the home directory of your SSH user on the cluster.

  2. Use SCP to copy the dev.properties file to the server:

    scp dev.properties USERNAME@CLUSTERNAME-ssh.azurehdinsight.net:dev.properties
    
  3. Once the file has finished uploading, use SSH to connect to the HDInsight cluster. Replace USERNAME the the name of your SSH login. Replace CLUSTERNAME with your HDInsight cluster name:

    ssh USERNAME@CLUSTERNAME-ssh.azurehdinsight.net
    

    If you used a password for your SSH account, you will be prompted to enter the password. If you used an SSH key with the account, you may need to use the -i parameter to specify the path to the key file. The following example will load the private key from ~/.ssh/id_rsa:

    ssh -i ~/.ssh/id_rsa USERNAME@CLUSTERNAME-ssh.azurehdinsight.net

    If you are using PuTTY, enter CLUSTERNAME-ssh.azurehdinsight.net in the Host Name (or IP address) field, and then click Open to connect. You will be prompted to enter your SSH account name.

    If you used a password for your SSH account, you will be prompted to enter the password. If you used an SSH key with the account, you may need to use the following steps to select the key:

    1. In Category, expand Connection, expand SSH, and select Auth.
    2. Click Browse and select the .ppk file that contains your private key.
    3. Click Open to connect.
  4. Use the following commands to start the topologies:

    storm jar EventHubExample-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --remote -R /writer.yaml --filter dev.properties
    storm jar EventHubExample-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux --remote -R /reader.yaml --filter dev.properties
    

    This will start the topologies and give them a friendly name of "reader" and "writer".

  5. To view the logged data, go to https://CLUSTERNAME.azurehdinsight.net/stormui, where CLUSTERNAME is the name of your HDInsight cluster. Select the topologies and drill down to the components. Select the port entry for an instance of a component to view logged information.

  6. Use the following commands to stop the topologies:

    storm kill reader
    storm kill writer
    

Project code of conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.