Skip to main content

Speech to text

A Speech service feature that accurately transcribes spoken audio to text.

Make spoken audio actionable

Quickly and accurately transcribe audio to text in more than 100 languages and variants. Customize models to enhance accuracy for domain-specific terminology. Get more value from spoken audio by enabling search or analytics on transcribed text or facilitating action—all in your preferred programming language.

High-quality transcription

Get accurate audio to text transcriptions with state-of-the-art speech recognition.

Customizable models

Add specific words to your base vocabulary or build your own speech-to-text models.

Flexible deployment

Run Speech to Text anywhere—in the cloud or at the edge in containers.

Production-ready

Access the same robust technology that powers speech recognition across Microsoft products.

Accurately transcribe speech from various sources

Convert audio to text from a range of sources, including microphonesaudio files, and blob storage. Use speaker diarisation to determine who said what and when. Get readable transcripts with automatic formatting and punctuation.

Customize speech models to your needs

Tailor your speech models to understand organization- and industry-specific terminology. Overcome speech recognition barriers such as background noise, accents, or unique vocabulary. Customize your models by uploading audio data and transcripts. Automatically generate custom models using Office 365 data to optimize speech recognition accuracy for your organization.

Deploy anywhere

Run Speech to Text wherever your data resides. Build speech applications that are optimized for robust cloud capabilities and on-premises using containers.

Fuel App Innovation with Cloud AI Services

Learn 5 key ways your organization can get started with AI to realize value quickly.
The report titled Fuel App Innovation with Cloud AI Services

Comprehensive privacy and security

  • Speech service, part of Azure Cognitive Services, is certified by SOC, FedRAMP, PCI DSS, HIPAA, HITECH, and ISO.

  • View and delete your custom speech data and models at any time. Your data is encrypted while it's in storage.

  • Your data remains yours. Your audio input and transcription data aren't logged during audio processing.

  • Backed by Azure infrastructure, Speech service offers enterprise-grade security, availability, compliance, and manageability.

Comprehensive security and compliance, built in

  • Microsoft invests more than $1 billion annually on cybersecurity research and development.

  • We employ more than 3,500 security experts who are dedicated to data security and privacy.

  • Azure has more certifications than any other cloud provider. View the comprehensive list.

  • Flexible pricing gives you the control you need

    With Speech to Text, pay as you go based on the number of hours of audio you transcribe, with no upfront costs.

Get started with an Azure free account

1

Start free. Get $200 credit to use within 30 days. While you have your credit, get free amounts of many of our most popular services, plus free amounts of 55+ other services that are always free.

2

After your credit, move to pay as you go to keep building with the same free services. Pay only if you use more than your free monthly amounts.

3

After 12 months, you'll keep getting 55+ always-free services—and still pay only for what you use beyond your free monthly amounts.

Businesses that trust Speech to Text

KPMG streamlines call transcription

KPMG uses Speech to Text to transcribe and catalog thousands of hours of calls, reducing compliance costs for its clients by as much as 80 percent.

Two people sitting down and having a conversation in an art gallery

Motorola helps first responders access vital data using voice

Motorola Solutions is helping police officers and other emergency first responders gain faster access to important information with a voice-powered virtual assistant.

A person speaking into a walkie talkie on their chest



Universal Electronics delivers voice-enabled smart home experiences

Universal Electronics is helping brands deliver voice-enabled navigation and control capabilities that work across everyday devices found in the home—offering a truly unique consumer experience.

 

A voice-enabled smart home device lighting up while a person is using a tablet in the background
Hochtief documents construction defects using voice

Hochtief is helping project managers identify and document construction defects at project sites with a voice-enabled virtual assistant.
A bridge between two buildings
NTT DATA accelerates decision-making with meeting insights

NTT DATA is unlocking insights from speech data with real-time meeting transcription. With Custom Speech, they are able to customize speech recognition models to understand organization-specific terms.
Two people from NTT Data smiling
Insight powers conversational banking experiences

Insight Enterprises is helping banks bring digital speed and convenience to their branches with a conversational-AI powered banking solution. Speech to Text converts what customers say into data that can be processed and analyzed so that customers can get timely, relevant responses.
A person using a large touch screen device on a wall
Back to tabs

Documentation and resources

Get started

Browse the documentation

Create a speech service with the Microsoft Learn course

Explore code samples

Check out our sample code

See customization resources

Explore and customize your voice-to-text solution with Speech Studio. No code required.

Frequently asked questions about Speech to Text

  • It is a feature within the Speech service that accurately and quickly transcribes audio to text.

  • Cognitive Services are a collection of customizable, prebuilt AI models that can be used to add AI to applications. There are a variety of domains, including Speech, Decision, Language, and Vision. Speech to Text is one feature within the Speech service. Other Speech related features include Text to SpeechSpeech Translation, and Speaker Recognition. An example of a Decision service is Personalizer, which allows you to deliver personalized, relevant experiences. Examples of Language services include Language UnderstandingText Analytics for natural language processing, QnA Maker for FAQ experiences, and Translator for language translation.

Start building with Cognitive Services

Try Speech to text free

Chat with sales