Speech to Text – Converts spoken audio to text for intuitive interaction
Easily add real-time speech-to-text capabilities to your applications for scenarios like voice commands, conversation transcription, and call center log analysis.
Tailor your speech recognition models to adapt to users’ speaking styles, expressions, and unique vocabularies, and to accommodate background noises, accents, and voice patterns.
Learn more
Text to Speech – Give natural voice to your apps
Build smart apps and services that speak to users naturally with the Text to Speech service. Convert text to audio in near real time, tailor to change the speed of speech, pitch, volume, and more.
Give your application a one-of-a-kind, recognizable brand voice using custom voice models. Simply record and upload training data, and the service will create a unique voice font tuned to your recording.
Learn more
Speech Translation
Give your app real-time speech translation capabilities in any of the supported languages and receive either a text or speech translation back. Speech Translation models are based on leading-edge speech recognition and neural machine translation (NMT) technologies. They're optimized to understand the way people speak in real life and generate translations of exceptional quality.
Learn more
Business scenarios built on Speech Services
Easily transcribe every call and optimize results through batch transcription and custom speech services enhanced for call center scenarios. Index call transcriptions for full-text search, or apply text analytics to detect sentiment, language, and key phrases for insights.
Learn more"We are impressed with the initial transcription accuracy of Custom Speech and Speaker Recognition. We are now working to optimise for a live environment which would be breakthrough for British Telecom Sport versus the current manual process."
Explore the Cognitive Services APIs
Computer Vision
Distill actionable information from images
Face
Detect, identify, analyze, organize, and tag faces in photos
Video Indexer
Unlock video insights
Custom Vision
Easily customize your own state-of-the-art computer vision models for your unique use case
Form Recognizer
The AI-powered document extraction service that understands your forms
Text Analytics
Easily evaluate sentiment and topics to understand what users want
Translator
Easily conduct machine translation with a simple REST API call
QnA Maker
Distill information into conversational, easy-to-navigate answers
Language Understanding
Teach your apps to understand commands from your users
Immersive Reader
Empower users of all ages and abilities to read and comprehend text
Speech Services
Unified speech services for speech-to-text, text-to-speech and speech translation
Speaker Recognition Preview
A Speech service feature that verifies and identifies speakers
Speech Translation
Easily integrate real-time speech translation to your app
Speech to Text
A Speech service feature that accurately converts spoken audio to text
Text to Speech
A Speech service feature that converts text to lifelike speech
Content Moderator
Automated image, text, and video moderation
Anomaly Detector
Easily add anomaly detection capabilities to your apps.
Personalizer
An AI service that delivers a personalized user experience
Metrics Advisor Preview
An AI service that monitors metrics and diagnoses issues