Convert spoken audio to text for more natural interactions
Try Cognitive Services for free
Sign in to continue
You’re almost ready to start building with your seven-day free evaluation.
Sign in with your preferred account to get started
Advanced speech recognition
Use Speech to Text – part of the Speech service – to swiftly convert audio into text from a variety of sources. Customise models to overcome common speech recognition barriers, such as unique vocabularies, speaking styles or background noise. Make audio more accessible by helping everyone to follow and engage in conversations in real time.
Benefit from leading-edge speech recognition accuracy powered by deep neural network models.
Transcribe audio to text in real time so that all participants in a conversation can fully engage.
Customised speech recognition
Tailor speech recognition to speaking styles and domain-specific terminology.
Run Speech to Text anywhere – in the cloud, on-premises or on the edge in containers.
Use breakthrough speech technology
Enhance your apps with speech capabilities powered by decades of breakthrough research. Microsoft was the first to reach human parity on the Switchboard conversational speech recognition task, and continues to drive cutting-edge research in speech recognition.Learn more about advancements in speech
To try out the demo with your own voice using a microphone, please change to a different browser that supports WebRTC, for example a recent version of Microsoft Edge, Firefox or Chrome.Microphone access was rejected.
Optimise speech recognition with tailored models
Customise your speech recognition models to overcome common speech recognition barriers. Tailor your language models to adapt to users’ speaking styles, accents or unique vocabulary, such as place names, products and industry-specific expressions. Automatically generate custom models using your Office 365 data to optimise speech recognition accuracy for organisation-specific terms.
Gain insights from your conversations
Transcribe multi-user conversations in real time, allowing participants to focus on the discussion. Identify who said what, when, and quickly follow up on next steps. Optimise the experience for multi-microphone devices. Enable analytics on your transcribed text to extract further insights from your conversations.Learn more about the conversation transcription capability
Deploy anywhere, from the cloud to the edge
Run Speech to Text in the cloud or on premises with containers for scenarios where data security and low latency are paramount.Learn more about Speech in containers
Security for the enterprise
Microsoft invests more than USD 1 billion annually on cybersecurity research and development.
We employ more than 3,500 security experts completely dedicated to your data security and privacy.
Azure has more compliance certifications than any other cloud service provider. View the comprehensive list.
Get the power, control and customisation you need with flexible pricing
Only pay for what you use, with no upfront costs. With Speech to Text, you pay as you go, based on hours of audio transcribed.See Speech to Text pricing
Get started with Speech to Text in three steps
Developer resources for Speech to Text
Documentation and tutorials
Get started with Speech to Text.
Take a Pluralsight course that walks you through using Speech to Text.
Learn more about scenarios for Speech to Text, such as conversation and call centre transcription.
Frequently asked questions about Speech to Text
For a full list of languages supported by Speech to Text, see our documentation.
Easily capture audio from a microphone, read from a stream, or access audio files from storage with the Speech SDK and REST APIs. The Speech SDK supports WAV/PCM 16-bit, 16 kHz/8 kHz, single-channel audio for speech recognition. Additional audio formats are supported using the speech-to-text REST endpoint or the batch transcription service.
Check the regional availability.