Speech services

Convert audio to text, perform speech translation and text-to-speech with the unified Speech services

Speech to Text – Converts spoken audio to text for intuitive interaction

Easily add real-time speech-to-text conversion to your applications for cases such as voice commands, real-time transcriptions or call centre log analysis.

Tailor your speech recognition models to adapt to users’ speaking styles, expressions or unique vocabulary, and to accommodate specific background noises, accents and voice patterns depending on your scenario.

Learn more
Speech To Text

Text to Speech – Give natural voice to your apps

Build smart apps and services that speak to users naturally with the Text to Speech service. Convert text to audio in near real time, tailor to change the speed of speech, pitch, volume and more.

Give your application a one-of-a-kind, recognisable brand voice using custom voice models. Simply record and upload training data, and the service will create a unique voice font tuned to your recording.

Learn more
Text to Speech

Speech translation

Give your app real-time speech translation capabilities in any of the supported languages and receive either a text or speech translation back. Speech translation models are based on leading-edge speech recognition and neural machine translation (NMT) technologies. They’re optimised to understand the way people speak in real life and generate translations of exceptional quality.

Learn more
Speech translation
"We are impressed with the initial transcription accuracy of Custom Speech and Speaker Recognition. We are now working to optimise for a live environment which would be breakthrough for British Telecom Sport versus the current manual process."

Kevin Blyth, British Telecom Research and Innovation

Explore the Cognitive Services APIs

Computer Vision

Distill actionable information from images


Detect, identify, analyse, organise and tag faces in photos

Video Indexer

Unlock video insights

Content moderator

Automated image, text and video moderation

Custom Vision

Easily customise your own state-of-the-art computer vision models for your unique use case

Text Analytics

Easily evaluate sentiment and topics to understand what users want

Translator Text

Easily conduct machine translation with a simple REST API call

Bing Spell Check

Detecting and correcting spelling mistakes in your app

QnA Maker

Distill information into conversational, easy-to-navigate answers

Content moderator

Automated image, text and video moderation

Language Understanding

Teach your apps to understand commands from your users


The Speech to Text API is part of Azure Cognitive Services Speech Services

Speaker Recognition PREVIEW

Use speech to identify and verify individual speakers

Text to Speech

Convert text to speech to create more natural, accessible interfaces

Speech translation

Easily integrate real-time speech translation to your app

Anomaly detector PREVIEW

Easily add anomaly detection capabilities to your apps.

Ready to supercharge your app?