Passer la navigation

Azure Cosmos DB - Change Feed Lab

The Azure Cosmos DB change feed is a mechanism for getting a continuous and incremental feed of records from a Cosmos DB container as those records are being created or modified.

In this lab, you'll focus on how a company can use the change feed feature to its advantage and understand user patterns with real-time data analysis visualization. You will approach the change feed from the perspective of an e-commerce company and work with a collection of events, i.e. records that represent occurrences such as a user viewing an item, adding an item to their cart, or purchasing an item. When one of these events occurs and a new record is consequently created, the change feed will log that record. The change feed will then trigger a series of steps resulting in visualization of metrics analyzing company performance and site activity.

Interested in watching a video about the solution before getting started?

New to Azure Cosmos DB? We recommend checking out Cosmos DB

New to the Cosmos DB change feed feature? We recommend checking out Change Feed

This lab will proceed as follows: 1. Data Generation: You will simulate data that represents events such as a user viewing an item, adding an item to their cart, and purchasing an item. You will do this in two ways.

(1) Randomized data generator to simulate events in bulk so that you have a large set of sample data   
(2) E-commerce web site you will run where you can view items from a product catalog, add them to your cart, and purchase the items in your cart. When you perform any of those actions, records will be created on Cosmos DB.   

Below is an example of a simulated event record:

    {      
        "CartID": 2486,
        "Action": "Viewed",
        "Item": "Women's Denim Jacket",
        "Price": 31.99
    }
  1. Cosmos DB: Cosmos DB will take in the generated data and store it in a collection.
  2. Change Feed: The change feed will listen for changes to the Cosmos DB collection. Every time a new document is added into the collection (i.e. an event occurs such a user viewing an item, adding an item to their cart, or purchasing an item), the change feed will trigger an Azure Function.
  3. Azure Function: The Azure Function will process the new data and send it into an Azure Event Hub.
  4. Event Hub: The Azure Event Hub will store these events and send them to Azure Stream Analytics.
  5. Azure Stream Analytics: You will write queries by which Azure Stream Analytics will process events and build a real-time data analysis. It will send that analysis to Microsoft Power BI.
  6. Power BI: You will use Microsoft Power BI to visualize the data analysis from Azure Stream Analytics and build a dashboard so you can see your metrics change in real time. Sample metrics that you can visualize include revenue, unique site visitors, most popular items, and average price of those items that are viewed versus added to a cart versus purchased. These sample metrics can help an e-commerce company evaluate its site popularity, develop its advertising and pricing strategies, and make decisions regarding what inventory to invest in.

Prerequisites

  • 64-bit Windows 10 Operating System
  • Microsoft .NET Framework 4.7.1 or higher
  • Microsoft .NET Core 2.1 (or higher) SDK
  • Visual Studio with Universal Windows Platform development, .NET desktop development, and ASP.NET and web development workloads
  • Microsoft Azure Subscription
  • Microsoft Power BI Account

Pre-Lab: Creating Azure Cosmos DB Resources
Part 1: Creating a Database and Collection To Load Data Into
Part 2: Creating a Leases Collection for Change Feed Processing
Part 3: Getting Storage Account Key and Connection String
Part 4: Getting Event Hub Namespace Connection String
Part 5: Setting Up Azure Function with Cosmos DB Account
Part 6: Inserting Simulated Data into Cosmos DB in Real Time
Part 7: Setting Up Azure Stream Analytics and Data Analysis Visualization
Part 8: Connecting to PowerBI
Optional: Visualizing with a Real E-Commerce Site

Pre-Lab: Creating Azure Cosmos DB Resources

You will be using Azure Resource Manager to create the Azure resources necessary for the lab.

  1. Make sure that you have an Unrestricted Execution Policy on your Windows PowerShell. To do so, open Windows PowerShell as an Administrator and run the following commands:

    Get-ExecutionPolicy

    Set-ExecutionPolicy Bypass

  2. Exit Windows PowerShell.

  3. Clone this repository to your computer.

  4. Open the repository that you just cloned, then navigate to the Azure Resource Manager folder, and open the file called parameters.json in Visual Studio.

  5. Fill in the blanks as indicated in parameters.json. You'll need to use the names that you give to each of your resources later. We recommend keeping this window available for your reference later in the lab.

  6. Open Windows PowerShell again, then navigate to the Azure Resource Manager folder and run the following command:
    .\deploy.ps1

  7. Once prompted, enter the Subcription ID for your Azure subscription (available on Azure Portal ), "changefeedlab" for your resource group name, and "run1" for your deployment name.

    Note: If you are having trouble finding your Subscription ID, go to the Azure Portal and search for Subscriptions, then select your subscription. You will see the Subscription ID in your subscription Overview.

  8. The resources will begin to deploy. This may take up to 10 minutes.

Part 2: Creating A Leases Collection for Change Feed Processing

The lease collection coordinates processing the change feed across multiple workers. A separate collection is used to store the leases with one lease per partition.

  1. Return to Data Explorer and select New Collection from the menu on the top of the page. Then perform the following actions:
    a. In the Database id field, select Use existing, then enter changefeedlabdatabase.
    b. In the Collection id field, enter leases.
    c. For Storage capacity, select Fixed.
    d. Leave the Throughput field set to its default value.
    e. Click the OK button.

Part 4: Getting Event Hub Namespace Connection String

An Azure Event Hub is exactly what it sounds like. It receives event data, stores event data, processes event data, and forwards event data. In this lab, the Event Hub will receive a document every time a new event occurs (i.e. an item is viewed by a user, added to a user's cart, or purchased by a user) and then will forward that document to Azure Stream Analytics.

  1. Return to your resource group and open the Event Hub Namespace that you created and named in the prelab .

  2. Select Shared access policies from the menu on the left-hand side.

  3. Select RootManageSharedAccessKey. Take the Connection string-primary key and copy it to a notepad or another document that you will have access to throughout the lab. You should label it Event Hub Namespace connection string. You'll need to copy the string into your code later, so please take note and remember where you are storing it.

Part 6: Inserting Simulated Data into Cosmos DB in Real Time

To see how the change feed processes new actions on an e-commerce site, you will simulate data that represents users viewing items from the product catalog, adding those items to their carts, and purchasing the items in their carts. This data is arbitrary and for the purpose of replicating what data on an Ecommerce site would look like.

  1. Navigate back to the repository in File Explorer, and right-click ChangeFeedFunction.sln to open it again in a new Visual Studio window.

Note: Yes, we know it may seem weird to have the same solution open in two windows! But it is necessary so that you can run two programs within the solution at the same time.

![data1](/Lab/labpics/data1.png)  
  1. Navigate to Azure Portal .

  2. Navigate to the resource group that you created during the prelab (named changefeedlab unless you chose otherwise) and then navigate to the Azure Cosmos DB account that you created during the prelab.

  3. Select Keys.

  4. Take the URI and PRIMARY KEY and copy it to a notepad or another document that you will have access to throughout the lab. You should label them URI and PRIMARY KEY. You'll need to copy them into your code later, so please take note and remember where you are storing them.

  5. Return to Visual Studio.

  6. Navigate to the App.config file.

  7. Within the <appSettings> block, add the URI and unique PRIMARY KEY that you just saved.

  8. Add in the collection and database names. (These names should be changefeedlabcollection and changefeedlabdatabase unless you chose to name yours differently.)

  9. Save the changes on all the files edited.

  10. At the top of Visual Studio, make sure that the Startup Project box on the left of the green arrow says DataGenerator, and if it says something else, arrow down and click on DataGenerator.
    Then press the Start button at the top of the page to run the program.

  11. Wait for the program to run. The stars mean that data is coming in! Keep the program running - it is important that lots of data is collected.

  12. If you navigate to Azure Portal , then to the Cosmos DB account within your resource group, then to Data Explorer, you will see the randomized data imported in your collection! Your collection is changefeedlabcollection unless you chose to name yours differently.

Part 8: Connecting to PowerBI

Power BI is a suite of business analytics tools to analyze data and share insights. It's a great example of how you can strategically visualize the data analysis that you coded in the last section.

  1. Sign in to Power BI and navigate to My Workspace by opening the menu on the left-hand side of the page.

  2. Click + Create in the top right-hand corner and then select Dashboard to create a dashboard.

  3. Click + Add tile in the top right-hand corner.

  4. Select Custom Streaming Data, then click the Next button.

  5. Select averagePrice from YOUR DATASETS, then click Next.

  6. In the Visualization Type field, choose Clustered bar chart from the drop-down menu.
    Under Axis, add action.
    Skip Legend without adding anything.
    Then, under the next section called Value, add avg.
    Click Next, then title your chart, and click Apply. You should see a new chart on your dashboard!

  7. Now, if you want to visualize more metrics (optional), you can go back to streamjob1 and create three more outputs with the following fields.
    a. Output alias: incomingRevenueOutput, Dataset name: incomingRevenue, Table name: incomingRevenue
    b. Output alias: top5Output, Dataset name: top5, Table name: top5
    c. Output alias: uniqueVisitorCountOutput, Dataset name: uniqueVisitorCount, Table name: uniqueVisitorCount

    Then click Edit query and paste the following queries above the one you already wrote.

    /*TOP 5*/
    WITH Counter AS
    (
    SELECT Item, Price, Action, COUNT(*) AS countEvents
    FROM input
    WHERE Action = 'Purchased'
    GROUP BY Item, Price, Action, TumblingWindow(second,30)
    ), 
    top5 AS
    (
    SELECT DISTINCT
    CollectTop(5)  OVER (ORDER BY countEvents) AS topEvent
    FROM Counter
    GROUP BY TumblingWindow(second,30)
    ), 
    arrayselect AS 
    (
    SELECT arrayElement.ArrayValue
    FROM top5
    CROSS APPLY GetArrayElements(top5.topevent) AS arrayElement
    ) 
    SELECT arrayvalue.value.item, arrayvalue.value.price, arrayvalue.value.countEvents
    INTO top5Output
    FROM arrayselect
    
    /*REVENUE*/
    SELECT System.TimeStamp AS Time, SUM(Price)
    INTO incomingRevenueOutput
    FROM input
    WHERE Action = 'Purchased'
    GROUP BY TumblingWindow(hour, 1)
    
    /*UNIQUE VISITORS*/
    SELECT System.TimeStamp AS Time, COUNT(DISTINCT CartID) as uniqueVisitors
    INTO uniqueVisitorCountOutput
    FROM input
    GROUP BY TumblingWindow(second, 5)
    

    Note:
    The TOP 5 query calculates the top 5 items, ranked by the number of times that they have been purchased. This metric can help e-commerce companies evaluate which items are most popular and can influence the company's advertising, pricing, and inventory decisions.

    The REVENUE query calculates revenue by summing up the prices of all items purchased each minute. This metric can help e-commerce companies evaluate its financial performance and also understand what times of day are most revenue-driving. This can impact overall company strategy, marketing in particular.

    The UNIQUE VISITORS query calculates how many unique visitors are on the site every 5 seconds by detecting unique cart ID's. This metric can help e-commerce companies evaluate their site activity and strategize how to acquire more customers.

  8. You can now add tiles for these datasets as well.
    For Top 5, it would make sense to do a clustered column chart with the items as the axis and the count as the value.
    For Revenue, it would make sense to do a line chart with time as the axis and the sum of the prices as the value. The time window to display should be the largest possible in order to deliver as much information as possible.
    For Unique Visitors, it would make sense to do a card visualization with the number of unique visitors as the value.

This is how our dashboard looks with these charts:

Optional: Visualizing with a Real E-commerce Site

You will now observe how you can use your new data analysis tool to connect with a real e-commerce site.

In order to build the e-commerce site, you'll use a Cosmos DB database to store the list of product categories (Women's, Men's, Unisex), the product catalog, and a list of the most popular items.

  1. Navigate back to the Azure Portal, then to your Cosmos DB account, then to Data Explorer.
    Add two collections under changefeedlabdatabase named products and categories with Fixed storage capacity.
    Add another collection under changefeedlabdatabase named topItems with Unlimited storage capacity. Write /Item as the partition key.

  2. Click on the topItems collection, and under Scale and Settings set the Time to Live to be 30 seconds so that topItems updates every 30 seconds.

  3. In order to populate the topItems collection with the most frequently purchased items, navigate back to streamjob1 and add a new Output. Select Cosmos DB.

  4. Fill in the required fields as pictured below.

  5. If you added the optional TOP 5 query in the previous part of the lab, proceed to part 5a.
    If not, proceed to part 5b.

    5a. In streamjob1, select Edit query and paste the following query in your Azure Stream Analytics query editor below the TOP 5 query but above the rest of the queries.

    SELECT arrayvalue.value.item AS Item, arrayvalue.value.price, arrayvalue.value.countEvents
    INTO topItems
    FROM arrayselect
    

    5b. In streamjob1, select Edit query and paste the following query in your Azure Stream Analytics query editor above all other queries.

    /*TOP 5*/
    WITH Counter AS
    (
    SELECT Item, Price, Action, COUNT(*) AS countEvents
    FROM input
    WHERE Action = 'Purchased'
    GROUP BY Item, Price, Action, TumblingWindow(second,30)
    ), 
    top5 AS
    (
    SELECT DISTINCT
    CollectTop(5)  OVER (ORDER BY countEvents) AS topEvent
    FROM Counter
    GROUP BY TumblingWindow(second,30)
    ), 
    arrayselect AS 
    (
    SELECT arrayElement.ArrayValue
    FROM top5
    CROSS APPLY GetArrayElements(top5.topevent) AS arrayElement
    ) 
    SELECT arrayvalue.value.item AS Item, arrayvalue.value.price, arrayvalue.value.countEvents
    INTO topItems
    FROM arrayselect
    
  6. Open EcommerceWebApp.sln and navigate to the Web.config file in the Solution Explorer.

  7. Within the <appSettings> block, add the URI and PRIMARY KEY that you saved earlier where it says your URI here and your primary key here.

    Then add in your database name and collection name as indicated. (These names should be changefeedlabdatabase and changefeedlabcollection unless you chose to name yours differently.)

    Fill in your products collection name, categories collection name, and top items collection name as indicated. (These names should be products, categories, and topItems unless you chose to name yours differently.)

  8. Navigate to and open the Checkout folder within EcommerceWebApp.sln. Then open the Web.config file within that folder.

  9. Within the <appSettings> block, add the URI and PRIMARY KEY that you saved earlier where indicated.
    Then add in your databse name and collection name as indicated. (These names should be changefeedlabdatabase and changefeedlabcollection unless you chose to name yours differently.)

  10. Press Start at the top of the page to run the program.

  11. Now you can play around on the e-commerce site. When you view an item, add an item to your cart, change the quantity of an item in your cart, or purchase an item, these events will be passed through the Cosmos DB change feed to Event Hub, ASA, and then Power BI. We recommend continuing to run DataGenerator to generate significant web traffic data and provide a realistic set of "Hot Products" on the e-commerce site.

To delete the resources that you created during this lab, navigate to the resource group on Azure Portal, then select Delete resource group from the menu at the top of the page and follow the instructions provided.

This is just one example of how the change feed can be used by Cosmos DB customers.

Feedback? Let us know here!

For more information about what's happening at Azure Cosmos DB, check out the Cosmos DB Blog!