Speech Services

Convert audio to text, perform speech translation and text-to-speech with the unified Speech services

Speech to Text – Converts spoken audio to text for intuitive interaction

Easily add real-time speech-to-text capabilities to your applications for scenarios like voice commands, conversation transcription and call center log analysis.

Tailor your speech recognition models to adapt to users’ speaking styles, expressions and unique vocabularies and to accommodate background noises, accents and voice patterns.

Learn More
Speech To Text

Text to Speech – Give natural voice to your apps

Build smart apps and services which speak to users naturally with the Text to Speech service. Convert text to audio in near real time, tailor to change the speed of speech, pitch, volume and more.

Give your application a one-of-a-kind, recognisable brand voice using custom voice models. Simply record and upload training data and the service will create a unique voice font tuned to your recording.

Learn More
Text to Speech

Speech Translation

Give your app real-time speech translation capabilities in any of the supported languages and receive either a text or speech translation back. Speech Translation models are based on leading-edge speech recognition and neural machine translation (NMT) technologies. They are optimised to understand the way people speak in real life and generate translations of exceptional quality.

Learn More
Speech Translation

Business scenarios built on Speech Services

Easily transcribe every call and optimise results through batch transcription and custom speech services enhanced for call center scenarios. Index call transcriptions for full-text search or apply text analytics to detect sentiment, language and key phrases for insights.

Learn More
Business Speech Services
"We are impressed with the initial transcription accuracy of Custom Speech and Speaker Recognition. We are now working to optimise for a live environment which would be breakthrough for British Telecom Sport versus the current manual process."

Kevin Blyth, British Telecom Research and Innovation

Explore the Cognitive Services APIs

Computer Vision

Distill actionable information from images


Detect, identify, analyse, organise, and tag faces in photos

Video Indexer

Unlock video insights

Custom Vision

Easily customise your own state-of-the-art computer vision models for your unique use case

Form Recogniser

The AI-powered document extraction service that understands your forms

Text Analytics

Easily evaluate sentiment and topics to understand what users want


Easily conduct machine translation with a simple REST API call

QnA Maker

Distill information into conversational, easy-to-navigate answers

Language Understanding

Teach your apps to understand commands from your users

Immersive Reader

Empower users of all ages and abilities to read and comprehend text

Speech Services

Unified speech services for speech-to-text, text-to-speech and speech translation

Speaker Recognition

A Speech service feature that verifies and identifies speakers

Speech Translation

Easily integrate real-time speech translation to your app

Speech to Text

A Speech service feature that accurately converts spoken audio to text

Text to Speech

A Speech service feature that converts text to lifelike speech

Content Moderator

Automated image, text and video moderation

Anomaly Detector

Easily add anomaly detection capabilities to your apps.


An AI service that delivers a personalised user experience

Metrics Advisor

An AI service that monitors metrics and diagnoses issues

Ready to supercharge your app?