Speech to Text

Convert spoken audio to text for more natural interactions

Advanced speech recognition

Use Speech to Text—part of the Speech service—to swiftly convert audio into text from a variety of sources. Customise models to overcome common speech recognition barriers, such as unique vocabularies, speaking styles, or background noise. Make audio more accessible by helping everyone follow and engage in conversations in real-time.

Breakthrough innovation

Benefit from leading-edge speech recognition accuracy powered by deep neural network models.

Real-time engagement

Transcribe audio to text in real time so that all participants in a conversation can fully engage.

Customised speech recognition

Tailor speech recognition to speaking styles and domain-specific terminology.

Flexible deployment

Run Speech to Text anywhere—in the cloud, on-premises or on the edge in containers.

Use breakthrough speech technology

Enhance your apps with speech capabilities powered by decades of breakthrough research. Microsoft was the first to reach human parity on the Switchboard conversational speech recognition task and continues to drive cutting-edge research in speech recognition.

Learn more about advancements in speech

To try out the demo with your own voice using a microphone, please change to a different browser with WebRTC support, for example a recent version of Microsoft Edge, Firefox or Chrome.

Want to build this?

Optimise speech recognition with tailored models

Customise your speech recognition models to overcome common speech recognition barriers. Tailor your language models to adapt to users' speaking styles, accents or unique vocabulary, like place names, products and industry-specific expressions. Automatically generate custom models using your Office 365 data to optimise speech recognition accuracy for organisation-specific terms.

Start using Custom Speech

Sample Sentences

Baseline

Custom Speech

Want to build this?

Gain insights from your conversations

Transcribe multi-user conversations in real time, allowing participants to focus on the discussion. Identify who said what, when, and quickly follow up on next steps. Optimise the experience for multi-microphone devices. Enable analytics on your transcribed text to extract further insights from your conversations.

Learn more about the Conversation Transcription capability

Deploy anywhere, from the cloud to the edge

Run Speech to Text in the cloud or on premises with containers for scenarios where data security and low latency are paramount.

Learn more about Speech in containers

Security for the enterprise

  • Microsoft invests more than USD 1 billion annually on cyber security research and development.

  • We employ more than 3,500 security experts completely dedicated to your data security and privacy.

  • Azure has more compliance certifications than any other cloud provider. View the comprehensive list.

Get the power, control and customisation you need with flexible pricing

Pay only for what you use, with no upfront costs. With Speech to Text, you pay as you go, based on hours of audio transcribed.

See Speech to Text pricing

Get started with Speech to Text in three steps

Get instant access and a USD 200 credit by signing up for an Azure free account.
Sign in to the Azure portal and add Speech.
Learn how to embed Speech to Text from the quickstarts and documentation.

Developer resources for Speech to Text

Documentation and tutorials

Get started with Speech to Text.

Courses

Take a Pluralsight course that walks you through using Speech to Text.

Use cases

Learn more about scenarios for Speech to Text, such as conversation and call center transcription.

Frequently asked questions about Speech to Text

  • For a full list of languages supported by Speech to Text, see our documentation.
  • Easily capture audio from a microphone, read from a stream, or access audio files from storage with the Speech SDK and REST APIs. The Speech SDK supports WAV/PCM 16-bit, 16 kHz/8 kHz, single-channel audio for speech recognition. Additional audio formats are supported using the speech-to-text REST endpoint or the batch transcription service.

Get started with Speech to Text