Convert spoken audio to text for more natural interactions

Make spoken audio actionable

Quickly and accurately transcribe audio to text in more than 30 languages. Customise models to enhance accuracy for domain-specific terminology. Get more value from spoken audio by enabling search or analytics on transcribed text or facilitating action – all in your preferred programming language.

High-quality transcription

Get accurate transcriptions with state-of-the-art speech recognition.

Customisable models

Add specific words to your base vocabulary or build your own models.

Flexible deployment

Run Speech to Text anywhere – in the cloud or at the edge in containers.


Access the same robust technology that powers speech recognition across Microsoft products.

Sample sentences


Custom Speech

Want to build this?

Transcribe speech accurately from various sources

Convert audio to text from a range of sources, including microphones, audio files and blob storage. Use speaker diarisation to determine who said what when. Get readable transcripts with automatic formatting and punctuation.

Customise speech models to your needs

Tailor your speech models to understand organisation- and industry-specific terminology. Overcome speech recognition barriers such as background noise, accents or unique vocabulary. Customise your models by uploading audio data and transcripts. Automatically generate custom models using Office 365 data to optimise speech recognition accuracy for your organisation.

Deploy anywhere, from the cloud to the edge

Run Speech to Text wherever your data resides. Build speech applications that are optimised for both robust cloud capabilities and edge locality using containers (preview). Speech containers support both standard and custom speech.

Comprehensive privacy and security

  • The Speech service, part of Azure Cognitive Services, is certified by SOC, FedRAMP, PCI DSS, HIPAA, HITECH and ISO.
  • Your data remains yours. Your audio input and transcription data aren’t logged during audio processing.
  • View and delete your custom speech data and models at any time. Your data is encrypted while it’s in storage.
  • Backed by Azure infrastructure, the Speech service offers enterprise-grade security, availability, compliance and manageability.

Flexible pricing gives you the power and control you need

Only pay for what you use, with no upfront costs. With Speech to Text, you pay as you go based on the number of hours of audio you transcribe.

Documentation and resources

Explore code samples

See customisation resources

Customise your speech solution with Speech Studio. No code required.


KPMG uses the customisation capabilities of Speech to Text to streamline call transcription and translation, achieving transcription accuracy of 90 per cent or better.

Get started with Speech