Speech to text

An AI Speech feature that accurately transcribes spoken audio to text.

Try Speech to text free Create a pay-as-you-go account

Make spoken audio actionable

Quickly and accurately transcribe audio to text in more than 100 languages and variants. Customize models to enhance accuracy for domain-specific terminology. Get more value from spoken audio by enabling search or analytics on transcribed text or facilitating action—all in your preferred programming language.

Check out what's new with Azure AI at Build

High-quality transcription

Get accurate audio to text transcriptions with state-of-the-art speech recognition.

Customizable models

Add specific words to your base vocabulary or build your own speech-to-text models.

Flexible deployment

Run Speech to Text anywhere—in the cloud or at the edge in containers.

Production-ready

Access the same robust technology that powers speech recognition across Microsoft products.

Accurately transcribe speech from various sources

Convert audio to text from a range of sources, including microphones, audio files, and blob storage. Use speaker diarisation to determine who said what and when. Get readable transcripts with automatic formatting and punctuation.

Customize speech models to your needs

Tailor your speech models to understand organization- and industry-specific terminology. Overcome speech recognition barriers such as background noise, accents, or unique vocabulary. Customize your models by uploading audio data and transcripts. Automatically generate custom models using Office 365 data to optimize speech recognition accuracy for your organization.

Deploy anywhere

Run Speech to Text wherever your data resides. Build speech applications that are optimized for robust cloud capabilities and on-premises using containers.

Fuel App Innovation with Cloud AI Services

Learn 5 key ways your organization can get started with AI to realize value quickly.

Read the report

The report titled Fuel App Innovation with Cloud AI Services

Comprehensive privacy and security

AI Speech, part of Azure AI Services, is certified by SOC, FedRAMP, PCI DSS, HIPAA, HITECH, and ISO.
View and delete your custom speech data and models at any time. Your data is encrypted while it's in storage.

Your data remains yours. Your audio input and transcription data aren't logged during audio processing.
Backed by Azure infrastructure, AI Speech offers enterprise-grade security, availability, compliance, and manageability.

Comprehensive security and compliance, built in

Microsoft invests more than $1 billion annually on cybersecurity research and development.

We employ more than 3,500 security experts who are dedicated to data security and privacy.

Azure has more certifications than any other cloud provider. View the comprehensive list.

Learn more about security on Azure

Flexible pricing gives you the control you need

With Speech to Text, pay as you go based on the number of hours of audio you transcribe, with no upfront costs.

See pricing details

Get started with an Azure free account

Start free. Get $200 credit to use within 30 days. While you have your credit, get free amounts of many of our most popular services, plus free amounts of 55+ other services that are always free.

After your credit, move to pay as you go to keep building with the same free services. Pay only if you use more than your free monthly amounts.

After 12 months, you'll keep getting 55+ always-free services—and still pay only for what you use beyond your free monthly amounts.

Documentation and resources

Get started

Browse the documentation

Create an AI Speech service with the Microsoft Learn course

Explore code samples

Check out our sample code

See customization resources

Explore and customize your voice-to-text solution with Speech Studio. No code required.

Frequently asked questions about Speech to Text

It is a feature within the Speech service that accurately and quickly transcribes audio to text.
AI Services are a collection of customizable, prebuilt AI models that can be used to add AI to applications. There are a variety of domains, including Speech, Decision, Language, and Vision. Speech to Text is one feature within the Speech service. Other Speech related features include Text to Speech, Speech Translation, and Speaker Recognition. An example of a Decision service is Personalizer, which allows you to deliver personalized, relevant experiences. Examples of AI Languages include Language Understanding, Text Analytics for natural language processing, QnA Maker for FAQ experiences, and Translator for language translation.

Start building with AI Services

Try Speech to text free

Popular

AI + machine learning

Analytics

Compute

Containers

Databases

DevOps

Developer tools

Hybrid + multicloud

Identity

Integration

Internet of Things

Management and governance

Media

Migration

Mixed reality

Mobile

Networking

Security

Storage

Web

Virtual desktop infrastructure

Use cases

Application development

AI

Cloud migration and modernization

Data and analytics

Hybrid cloud and infrastructure