Speech to text
A Speech service feature that accurately transcribes spoken audio to text
Make spoken audio actionable
Quickly and accurately transcribe audio to text in more than 100 languages and variants. Customise models to enhance accuracy for domain-specific terminology. Get more value from spoken audio by enabling search or analytics on transcribed text or facilitating action—all in your preferred programming language.
High-quality transcription
Get accurate audio-to-text transcriptions with state-of-the-art speech recognition.
Customisable models
Add specific words to your base vocabulary or build your own speech-to-text models.
Flexible deployment
Run Speech to Text anywhere – in the cloud or at the edge in containers.
Production-ready
Access the same robust technology that powers speech recognition across Microsoft products.
Try Speech to Text with this demo app, built on our JavaScript SDK
To try out the demo with your own voice using a microphone, please change to a different browser with WebRTC support, for example a recent version of Microsoft Edge, Firefox or Chrome.
Your speech data will not be stored
Accurately transcribe speech from various sources
Convert audio to text from a range of sources, including microphones, audio files, and blob storage. Use speaker diarisation to determine who said what and when. Get readable transcripts with automatic formatting and punctuation.
Customise speech models to your needs
Tailor your speech models to understand organisation – and industry-specific terminology. Overcome speech recognition barriers such as background noise, accents or unique vocabulary. Customise your models by uploading audio data and transcripts. Automatically generate customised models using Office 365 data to optimise speech recognition accuracy for your organisation.
Deploy anywhere
Run Speech to Text wherever your data resides. Build speech applications that are optimised for robust cloud capabilities and on-premises using containers.
Fuel App Innovation with Cloud AI Services
Learn 5 key ways your organisation can get started with AI to realise value quickly.
Comprehensive privacy and security
- Speech service, part of Azure Cognitive Services, is certified by SOC, FedRAMP, PCI DSS, HIPAA, HITECH and ISO.
- Your data remains yours. Your audio input and transcription data aren’t logged during audio processing.
- View and delete your custom speech data and models at any time. Your data is encrypted while it’s in storage.
- Backed by Azure infrastructure, Speech service offers enterprise-grade security, availability, compliance and manageability.
Flexible pricing gives you the control you need
With Speech to Text, pay as you go based on the number of hours of audio you transcribe, with no upfront costs.
Documentation and resources
Explore code samples
Take a look at our sample code
See customisation resources
Explore and customise your voice-to-text solution with Speech Studio. No code required.
Businesses that trust Speech to Text
KPMG streamlines call transcription
KPMG uses Speech to Text to transcribe and catalogue thousands of hours of calls, reducing compliance costs for its clients by as much as 80 per cent.

Motorola helps first responders access vital data using voice
Motorola Solutions is helping police officers and other emergency first responders gain faster access to important information with a voice-powered virtual assistant.

Universal Electronics delivers voice-enabled smart home experiences
Universal Electronics is helping brands deliver voice-enabled navigation and control capabilities that work across everyday devices found in the home – offering a truly unique consumer experience.

Hochtief documents construction defects using voice
Hochtief is helping project managers identify and document construction defects at project sites with a voice-enabled virtual assistant.

NTT DATA accelerates decision-making with meeting insights
NTT DATA is unlocking insights from speech data with real-time meeting transcription. With Custom Speech, they are able to customise speech recognition models to understand organisation-specific terms.

Insight powers conversational banking experiences
Insight Enterprises is helping banks bring digital speed and convenience to their branches with a conversational AI-powered banking solution. Speech to Text converts what customers say into data that can be processed and analysed so that customers can get timely, relevant responses.

Frequently asked questions about Speech to Text
-
It is a feature within the Speech service that accurately and quickly transcribes audio to text.
-
Cognitive Services are a collection of customisable, prebuilt AI models that can be used to add AI to applications. There are a variety of domains, including speech, decision, language and vision. Speech to Text is one feature within the Speech service. Other speech-related features include Text to Speech, Speech Translation and Speaker Recognition. An example of a Decision service is Personaliser, which allows you to deliver personalised, relevant experiences. Examples of language services include Language Understanding, Text Analytics for natural language processing, QnA Maker for FAQ experiences and Translator for language translation.