Running Azure Batch jobs using the Azure CLI – no code required

Publicado em 13 julho, 2017

Principal Program Manager

When we introduced Azure Batch the target audience was the developer producing SaaS or client solutions where there was the need to run applications or algorithms at scale. Developers use the Batch APIs to integrate with Batch and utilize it as a component within their solution.

Since launch, we have been adding further capabilities that make it easier to use Batch without code and therefore expanding the audience that can take advantage of Batch. We are pleased to announce the recent addition of new Azure CLI capabilities that make it possible to define and run jobs end-to-end. Users can directly, or via scripting, create pools, upload data, run jobs at scale, and download output data – all using the Azure CLI, no code required.

Batch templates

Batch templates build on the existing Batch support in the Azure CLI that allows JSON files to specify property names and values for the creation of pools, jobs, tasks, and other items. With Batch templates, the following capabilities are available compared to what is possible with the JSON files:

  • Parameters can be defined. When the template is used, only the parameter values are specified to create the item, with other item property values being specified in the template body. A user who understands Batch and the applications to be run by Batch can create the templates, specifying pool, job, and task property values. A user less familiar with Batch and/or the applications simply needs to specify the values for the defined parameters.
  • Job task factories create the one or more tasks associated with a job, avoiding the need for many task definitions to be created and drastically simplifying job submission.

Upload and download of input and output files

Input data files need to be supplied for jobs and output data files are often produced. A storage account is associated, by default, with each Batch account; using the Azure CLI, files can now be easily transferred between a client and this storage account, with no coding required. Additionally, files are referenced by pool and job templates for transfer to and from pool nodes.

Example – transcoding video files using ffmpeg

ffmpeg is a popular application that processes audio and video files. The Azure Batch CLI can be used to invoke ffmpeg to transcode multiple video files in parallel, converting source video files to different resolutions.

Create a pool template

A pool of VM nodes will be required on which the ffmpeg application will need to be installed and on which individual transcodes will be run. Someone with knowledge of Batch and ffmpeg defines a pool template. The template is written so that when it is used to create the pool, only a pool id and number of nodes need to be specified.

The template defines:

  • Two parameters whose values need to be supplied when the template is used to create a pool – the pool id and the number of pool nodes.
  • The template body specifies the OS, VM sizes, the ffmpeg package, and other pool properties.

An example pool template would be:

{
      "parameters": {
          "nodeCount": {
              "type": "int",
              "metadata": { "description": "The number of pool nodes" }
          },
          "poolId": {
              "type": "string",
              "metadata": { "description": "The pool id " }
          }
      },
      "pool": {
          "type": "Microsoft.Batch/batchAccounts/pools",
          "apiVersion": "2016-12-01",
          "properties": {
              "id": "[parameters('poolId')]",
              "virtualMachineConfiguration": {
                  "imageReference": {
                      "publisher": "Canonical",
                      "offer": "UbuntuServer",
                      "sku": "16.04.0-LTS",
                      "version": "latest"
                  },
                  "nodeAgentSKUId": "batch.node.ubuntu 16.04"
              },
              "vmSize": "STANDARD_D3_V2",
              "targetDedicatedNodes": "[parameters('nodeCount')]",
              "enableAutoScale": false,
              "maxTasksPerNode": 1,
              "packageReferences": [
                  {
                      "type": "aptPackage",
                      "id": "ffmpeg"
                  }
              ]
} } }

Create a job template

To transcode the video files, a job will be created with one task per video file. Each task needs to invoke the ffmpeg application with parameters specifying the source video file that will be copied onto the node, the target resolution, the output file name and location, as well as other task properties.

Someone with knowledge of Batch and ffmpeg defines a job template. This template has been written so that when it is used only the pool id and job id need to be specified. For simplicity, it is assumed that source files will be uploaded to a fixed location, the output files will be written to a fixed location, and the output resolution is set to a specific value.

{
      "parameters": {
          "poolId": {
              "type": "string",
              "metadata": {
                  "description": "The pool id which runs the job"
              }
          },
          "jobId": {
              "type": "string",
              "metadata": {
                  "description": "The job id"
              }
          },
          "resolution": {
              "type": "string",
              "defaultValue": "428x240",
              "allowedValues": [
                  "428x240",
                  "854x480"
              ],
              "metadata": {
                  "description": "Target video resolution"
              }
          }
      },
      "job": {
          "type": "Microsoft.Batch/batchAccounts/jobs",
          "apiVersion": "2016-12-01",
          "properties": {
              "id": "[parameters('jobId')]",
              "constraints": {
                  "maxWallClockTime": "PT5H",
                  "maxTaskRetryCount": 1
              },
              "poolInfo": {
                  "poolId": "[parameters('poolId')]"
              },
              "taskFactory": {
                  "type": "taskPerFile",
                  "source": {
                      "fileGroup": "ffmpeg-input"
                  },
                  "repeatTask": {
                      "commandLine": "ffmpeg -i {fileName} -y -s [parameters('resolution')] -strict -2 {fileNameWithoutExtension}_[parameters('resolution')].mp4",
                      "resourceFiles": [
                          {
                              "blobSource": "{url}",
                              "filePath": "{fileName}"
                          }
                      ],
                      "outputFiles": [
                          {
                              "filePattern": "{fileNameWithoutExtension}_[parameters('resolution')].mp4",
                              "destination": {
                                  "autoStorage": {
                                      "path": "{fileNameWithoutExtension}_[parameters('resolution')].mp4",
                                      "fileGroup": "ffmpeg-output"
                                  }
                              },
                            "uploadOptions": {
                                 "uploadCondition": "TaskSuccess"
                             }
                          }
                      ]
                  }
              },
              "onAllTasksComplete": "terminatejob"
} } }

 

Create a pool using the pool template

A user with files to transcode can first create a pool containing the nodes which will perform the transcodes. If scripted, the parameter values can be passed in the command line; if invoked directly the user will be prompted for the parameter values.

C:\BatchCliTemplates>az batch pool create --template pool-ffmpeg.json
You are using an experimental feature {Pool Template}.
nodeCount (The number of pool nodes): 20
poolId (The pool id): MyFfmpegPool

As a user of the template, I haven’t had to understand Azure VM sizes, pool properties, and how to install ffmpeg.

Upload source files to transcode

I need to upload the files to be transcoded to Azure. I was supplied the name of the file group to use, which equates to a container created on the Azure Storage account associated with the Batch account.

az batch file upload --local-path c:\source_videos\*.mp4 --file-group ffmpeg-input

Run a job to transcode the source files using the job template

A job needs to be created that will have one task per input file that was uploaded. If scripted, the parameter values can be passed in the command line; if invoked directly the user will be prompted for the parameter values.

az batch job create --template job-ffmpeg.json

As a user of the template, I haven’t had to understand how to invoke ffmpeg, specifying the appropriate parameters to perform transcoding, plus I haven’t had to specify the Batch properties for jobs and tasks.

Download the transcoded files

If the transcoded output files are required on the client then they can easily be downloaded.

az batch file download --file-group ffmpeg-output --local-path c:\output_lowres_videos

Summary

This example has shown how a user has been able to create a Batch pool and job template to perform video transcoding using ffmpeg. The user has not needed to use the Batch APIs; they have needed knowledge of the ffmpeg application and Azure Batch. To use the templates, an end-user simply has to use the Azure CLI to upload the files to transcode, download the output files, and supply the pool and job template parameter values.

More information

More detailed information is available: