Speech services

Convert audio to text, perform speech translation and text-to-speech with the unified Speech services

Speech to Text – Converts spoken audio to text for intuitive interaction

Easily add real-time speech-to-text capabilities to your applications for scenarios like voice commands, conversation transcription and call centre log analysis.

Tailor your speech recognition models to adapt to users’ speaking styles, expressions and unique vocabularies, and to accommodate background noises, accents and voice patterns.

Learn more
Speech To Text

Text to Speech – Give natural voice to your apps

Build smart apps and services that speak to users naturally with the Text to Speech service. Convert text to audio in near real time, tailor to change the speed of speech, pitch, volume and more.

Give your application a one-of-a-kind, recognisable brand voice using custom voice models. Simply record and upload training data, and the service will create a unique voice font tuned to your recording.

Learn more
Text to Speech

Speech translation

Give your app real-time speech translation capabilities in any of the supported languages and receive either a text or speech translation back. Speech translation models are based on leading-edge speech recognition and neural machine translation (NMT) technologies. They’re optimised to understand the way people speak in real life and generate translations of exceptional quality.

Learn more
Speech translation

Business scenarios built on Speech Services

Easily transcribe every call and optimise results through batch transcription and custom speech services enhanced for call centre scenarios. Index call transcriptions for full-text search, or apply text analytics to detect sentiment, language and key phrases for insights.

Learn more
Business Speech Services
"We are impressed with the initial transcription accuracy of Custom Speech and Speaker Recognition. We are now working to optimise for a live environment which would be breakthrough for British Telecom Sport versus the current manual process."

Kevin Blyth, British Telecom Research and Innovation

Explore the Cognitive Services APIs

Computer Vision

Distill actionable information from images


Detect, identify, analyse, organise and tag faces in photos

Ink Recogniser PREVIEW

An AI service that recognises digital ink content, such as handwriting, shapes and ink document layout

Video Indexer

Unlock video insights

Custom Vision

Easily customise your own state-of-the-art computer vision models for your unique use case

Form Recogniser PREVIEW

The AI-powered document extraction service that understands your forms

Text Analytics

Easily evaluate sentiment and topics to understand what users want

Translator Text

Easily conduct machine translation with a simple REST API call

QnA Maker

Distill information into conversational, easy-to-navigate answers

Language Understanding

Teach your apps to understand commands from your users

Immersive Reader PREVIEW

Empower users of all ages and abilities to read and comprehend text

Speech services

Unified speech services for speech-to-text, text-to-speech and speech translation

Speaker Recognition PREVIEW

Use speech to identify and verify individual speakers

Content moderator

Automated image, text and video moderation

Anomaly detector PREVIEW

Easily add anomaly detection capabilities to your apps.

Personaliser PREVIEW

An AI service that delivers a personalised user experience

Ready to supercharge your app?