Skip to main content

Introducing the Ingestion Client for Azure Speech

Published date: June 28, 2021

Speech is an Azure Cognitive Service that enables you to build scalable solutions that can handle a variety of speech-related tasks, like transcribe audio, produce natural sounding voices, recognize who is speaking, and to handle speech translation.

Today, we are introducing the Ingestion Client, an Azure solution that will monitor your dedicated Azure Storage container so that audio files landing in that storage are automatically transcribed.

We created this tool to help you set-up a full blown, scalable, and secure transcription pipeline through simple configuration and without any development effort. The Ingestion Client incorporates best practices to maximize transcription requests in terms of scaling (to hundreds of thousands of files), error management, retry logic, and various other optimizations. The set-up is carried out through ARM-deployment. The architecture of the solution this ARM template deploys is described in the figure below.


Graphical user interface, diagram, applicationDescription automatically generated

When a user uploads an audio file to the dedicated Azure Storage container, timer triggered Azure Functions picks this file up and creates a transcription request using either the Speech-to-text REST API v3.0 or Speech SDK (user's choice). When the transcription is successfully completed, the solution writes the transcript to the containers from which the audio file was obtained. Additionally, users can choose to apply analytics on the transcript, produce reports, or redact, all of which are the result of additional resources being deployed through the ARM template.

Explore our guide for more information about the tool and installation notes and download the code from this Github repo.

  • Speech to text
  • Azure AI Speech
  • Operating System
  • SDK and Tools