Trace Id is missing
Skip to main content

Speech to text

An AI Speech feature that accurately transcribes spoken audio to text.

Make spoken audio actionable

Quickly and accurately transcribe audio to text in more than 100 languages and variants. Customize models to enhance accuracy for domain-specific terminology. Get more value from spoken audio by enabling search or analytics on transcribed text or facilitating action—all in your preferred programming language.

Check out what's new with Azure AI at Build

High-quality transcription

Get accurate audio to text transcriptions with state-of-the-art speech recognition.

Customizable models

Add specific words to your base vocabulary or build your own speech-to-text models.

Flexible deployment

Run Speech to Text anywhere—in the cloud or at the edge in containers.


Access the same robust technology that powers speech recognition across Microsoft products.

Accurately transcribe speech from various sources

Convert audio to text from a range of sources, including microphonesaudio files, and blob storage. Use speaker diarisation to determine who said what and when. Get readable transcripts with automatic formatting and punctuation.

Customize speech models to your needs

Tailor your speech models to understand organization- and industry-specific terminology. Overcome speech recognition barriers such as background noise, accents, or unique vocabulary. Customize your models by uploading audio data and transcripts. Automatically generate custom models using Office 365 data to optimize speech recognition accuracy for your organization.

Deploy anywhere

Run Speech to Text wherever your data resides. Build speech applications that are optimized for robust cloud capabilities and on-premises using containers.

Fuel App Innovation with Cloud AI Services

Learn 5 key ways your organization can get started with AI to realize value quickly.

The report titled Fuel App Innovation with Cloud AI Services

Comprehensive privacy and security

  • AI Speech, part of Azure AI Services, is certified by SOC, FedRAMP, PCI DSS, HIPAA, HITECH, and ISO.

  • View and delete your custom speech data and models at any time. Your data is encrypted while it's in storage.

  • Your data remains yours. Your audio input and transcription data aren't logged during audio processing.

  • Backed by Azure infrastructure, AI Speech offers enterprise-grade security, availability, compliance, and manageability.

Comprehensive security and compliance, built in

Get started with an Azure free account


Start free. Get USD200 credit to use within 30 days. While you have your credit, get free amounts of many of our most popular services, plus free amounts of 55+ other services that are always free.


After your credit, move to pay as you go to keep building with the same free services. Pay only if you use more than your free monthly amounts.


After 12 months, you'll keep getting 55+ always-free services—and still pay only for what you use beyond your free monthly amounts.

Documentation and resources

Get started

Browse the documentation

Create a AI Speech service with the Microsoft Learn course

Explore code samples

Check out our sample code

See customization resources

Explore and customize your voice-to-text solution with Speech Studio. No code required.

Frequently asked questions about Speech to Text

  • It is a feature within the Speech service that accurately and quickly transcribes audio to text.

  • AI Services are a collection of customizable, prebuilt AI models that can be used to add AI to applications. There are a variety of domains, including Speech, Decision, Language, and Vision. Speech to Text is one feature within the AI Speech service. Other Speech related features include Text to SpeechSpeech Translation, and Speaker Recognition. An example of a Decision service is Personalizer, which allows you to deliver personalized, relevant experiences. Examples of AI Languages include Language UnderstandingText Analytics for natural language processing, QnA Maker for FAQ experiences, and Translator for language translation.

Start building with AI Services

Try Speech to text free