Skip Navigation

Speech Services

Convert audio to text, perform speech translation and text-to-speech with the unified Speech services

Speech to Text – Converts spoken audio to text for intuitive interaction

Easily add real-time speech-to-text conversion to your applications for cases like voice commands, real-time transcriptions, or call center log analysis.

Tailor your speech recognition models to adapt to users’ speaking styles, expressions, or unique vocabulary, and to accommodate specific background noises, accents, and voice patterns depending on your scenario.

Learn more
Speech To Text

Text to Speech – Give natural voice to your apps

Build smart apps and services that speak to users naturally with the Text to Speech service. Convert text to audio in near real time, tailor to change the speed of speech, pitch, volume, and more.

Give your application a one-of-a-kind, recognizable brand voice using custom voice models. Simply record and upload training data, and the service will create a unique voice font tuned to your recording.

Learn more
Text to Speech

Speech Translation

Give your app real-time speech translation capabilities in any of the supported languages and receive either a text or speech translation back. Speech Translation models are based on leading-edge speech recognition and neural machine translation (NMT) technologies. They're optimized to understand the way people speak in real life and generate translations of exceptional quality.

Learn more
Speech Translation
"We are impressed with the initial transcription accuracy of Custom Speech and Speaker Recognition. We are now working to optimise for a live environment which would be breakthrough for British Telecom Sport versus the current manual process."

Kevin Blyth, British Telecom Research and Innovation

Explore the Cognitive Services APIs

Computer Vision

Distill actionable information from images


Detect, identify, analyze, organize, and tag faces in photos

Video Indexer

Unlock video insights

Content Moderator

Automated image, text, and video moderation

Custom Vision PREVIEW

Easily customize your own state-of-the-art computer vision models for your unique use case

Text Analytics

Easily evaluate sentiment and topics to understand what users want

Translator Text

Easily conduct machine translation with a simple REST API call

Bing Spell Check

Detect and correct spelling mistakes in your app

Content Moderator

Automated image, text, and video moderation

Language Understanding

Teach your apps to understand commands from your users

Speech Services

Unified speech services for speech-to-text, text-to-speech and speech translation

Speaker Recognition PREVIEW

Use speech to identify and verify individual speakers

QnA Maker

Distill information into conversational, easy-to-navigate answers

Ready to supercharge your app?