Introducing the Ingestion Client for Azure Speech
Published date: 28 June, 2021
Speech is an Azure Cognitive Service that enables you to build scalable solutions that can handle a variety of speech-related tasks, like transcribe audio, produce natural sounding voices, recognize who is speaking, and to handle speech translation.
We created this tool to help you set-up a full blown, scalable, and secure transcription pipeline through simple configuration and without any development effort. The Ingestion Client incorporates best practices to maximize transcription requests in terms of scaling (to hundreds of thousands of files), error management, retry logic, and various other optimizations. The set-up is carried out through ARM-deployment. The architecture of the solution this ARM template deploys is described in the figure below.
When a user uploads an audio file to the dedicated Azure Storage container, timer triggered Azure Functions picks this file up and creates a transcription request using either the Speech-to-text REST API v3.0 or Speech SDK (user's choice). When the transcription is successfully completed, the solution writes the transcript to the containers from which the audio file was obtained. Additionally, users can choose to apply analytics on the transcript, produce reports, or redact, all of which are the result of additional resources being deployed through the ARM template.