• 2 min read

Azure Data Factory Updates: Integration with Azure Machine Learning!

In this post, we will take a look at the Azure Data Factory integration with Azure Machine Learning.

We listened to your feedback, and last week Azure Data Factory released a powerful enhancement enabling integration with Azure Machine Learning. You now have the ability to run your finished Azure Machine Learning models from within your data factory pipelines – so you can repeatedly feed your trained scoring models with data from multiple sources. The seamless integration enables batch prediction scenarios such as identifying possible loan defaults, determining sentiment, and analyzing customer behavior patterns.

Get started quickly by creating an AzureMLLinkedService and AzureMLBatchScoringActivity to invoke your batch Azure Machine Learning models in a data pipeline.

AzureMLLinkedService

This is the batch scoring URL (mlEndpoint) to the Machine Learning model published in an Azure ML workspace. Along with the endpoint, you will need the API Key to access any of your Machine Learning models and can easily retrieve this from your published Machine Learning model (see Figure 1 below).

Example:

{
    "name": "MyAzureMLLinkedService",
    "properties":
    {
        "type": "AzureMLLinkedService",
        "hubName" : "Hub-AzureML",
        "mlEndpoint":"https://[batch scoring endpoint]/jobs",
        "apiKey":"[apikey]"
    }
}

2014-12-15_11h04_55
Figure 1. To find the API Key and batch scoring URL of the published model, the batch scoring URL for the AzureMLLinkedService can be obtained by clicking ‘API help page’ in the image above, which is the following in this example:

Example Batch Scoring URL:
https://ussouthcentral.services.azureml.net/workspaces/da9e895b758e44b2812a6218d507e216/services/8c91ff3461a416f8f8e0d96a1162681/jobs/

AzureMLBatchScoringActivity

This new Machine Learning Activity type in Azure Data Factory allows you to easily operationalize your finished Machine Learning models without writing any Custom Code. It will get the location of the input file from your input tables and call the AzureML Batch Scoring API. Following successful execution of the Scoring API, it will copy the batch scoring output to the Azure Blob given in your output table. Unlike other Data Factory Activities, an AzureMLBatchScoringActivity can have only one input and one output Table.

Example:

{  
   "name":"PredictivePipeline",
   "properties":{  
      "description":"use AzureML model",
      "hubName":"Hub-AzureML",
      "activities":[  
         {  
            "name":"MLActivity",
            "type":"AzureMLBatchScoringActivity",
            "description":"prediction analysis on batch input",
            "inputs":[  
               {  
                  "name":"ScoringInputBlob"
               }
            ],
            "outputs":[  
               {  
                  "name":"ScoringResultBlob"
               }
            ],
            "linkedServiceName":"MyAzureMLLinkedService",
            "policy":{  
               "concurrency":3,
               "executionPriorityOrder":"NewestFirst",
               "retry":1,
               "timeout":"02:00:00"
            }
         }
      ]
   }
}

Using the simple steps described above, customers publishing Azure Machine Learning models can now easily and quickly operationalize them within their data flow pipelines using Azure Data Factory.

To find more details about creating predictive pipelines using Azure Data Factory and Azure Machine Learning, please click here.

You can also visit the Azure Data Factory GitHub repository and try out our E2E Twitter Analysis Sample using the newly released integration.