An Azure Function orchestrates a real-time, serverless, big data pipeline

在 十二月 10, 2018 上貼文

Software Architect, Microsoft Azure

Although it’s not a typical use case for Azure Functions, a single Azure function is all it took to fully implement an end-to-end, real-time, mission-critical data pipeline for a fraud detection scenario. And it was done with a serverless architecture. Two blogs recently described this use case, “Considering Azure Functions for a serverless data streaming scenario,” and “A fast, serverless, big data pipeline powered by a single Azure Function.”

Graphic representing Azure Functions

Pipeline requirements

A large bank wanted to build a solution to detect fraudulent transactions. The solution was built on an architectural pattern common for big data analytic pipelines, with massive volumes of real-time data ingested into a cloud service where a series of data transformation activities provided input for a machine learning model to deliver predictions. Latency and response times are critical in a fraud detection solution, so the pipeline had to be very fast and scalable. End-to-end evaluation of each transaction had to complete and provide a fraud assessment in less than two seconds.

Requirements for the pipeline included the following:

  • Ability to scale and efficiently process bursts of event activity totaling 8+ million transactions daily.
  • Daily parsing and processing of 4 million complex JSON files.
  • Events and transactions had to be processed in sequential order with assurances that duplicates would not be processed.
  • Reference data and business rules could change dynamically and the pipeline needed to accommodate these updates.
  • A deployed architecture that could easily integrate with a CI/CD and DevOps process.

Pipeline solution

The pipeline starts and ends with an Azure Function. A single function orchestrates and manages the entire pipeline of activities, including the following:

  1. Consuming, validating, and parsing massive numbers of JSON files.
  2. Invoking a SQL stored procedure to extract data elements from JSON files, with data used to build real-time behavioral profiles for bank accounts and customers, and to generate an analytics feature set.
  3. Invoking a machine learning model to evaluate and score each individual transaction.
  4. Posting the fraud score back to an on-premises API for integration to a case management solution (a separate solution that lets users examine and unblock transactions).

Recommended next steps

If you are designing a real-time, serverless data pipeline and seek the flexibility of coding your own methods for integration with other services, or to deploy through continuous integration, consider using Azure Functions to orchestrate and manage the pipeline.

Read the “Mobile Bank Fraud Solution Guide” to learn details about the architecture and implementation. Read more about the pipeline technology decision and implementation in these two blogs, “Considering Azure Functions for a serverless data streaming scenario,” and “A fast, serverless, big data pipeline powered by a single Azure Function.” We hope you find this helpful and we welcome your feedback.