Trace Id is missing
Skip to main content
Azure

Azure AI Speech

Build multimodal, multilingual AI apps faster with pre-built or customizable speech models.
OVERVIEW

Add multimodality to your generative AI apps

  • Build voice-enabled, multilingual generative AI apps with fast transcriptions and natural-sounding voices.
  • Customize speech in your app for your domain—including OpenAI Whisper model—or give your copilot a branded voice.
  • Enable real-time, multi-language speech to speech translation and speech to text transcription of audio streams.
  • Run AI models wherever your data resides. Deploy your apps in the cloud or at the edge with containers.
USE CASES

Develop multimodal generative AI apps with speech models

Transcribe speech to text

Transcribe call center or meeting conversations. Go global with audio-captioning in more than 100 languages.

Convert text to speech

Build bots that speak naturally. Differentiate your brand with customized, realistic voices and speaking styles.

Speech analytics

Analyze audio or video call recordings to gain deep insights. Summarize key topics and extract or redact personal identification information.

Transcribe audio with OpenAI Whisper

Transform your call centers using the latest OpenAI Whisper model in Azure AI Speech or Azure OpenAI Service.

Build custom voices

Build natural-sounding voices with custom neural voice.

Build your avatars

Bring your brand to life using pre-built or custom avatars with natural-sounding voices.

Verify and recognize speakers

Confirm a person's identity or recognize who's speaking in a meeting by adding speaker verification and identification to your app.

Enable multilingual communication

Translate audio or video data from and into an ever-growing list of supported languages. Customize translations to your industry.

Embedd speech

Use embedded speech to power on-device speech to text and text to speech scenarios where cloud connectivity is intermittent or unavailable.
SECURITY

Built-in security and compliance 

Microsoft has committed to investing USD20 billion in cybersecurity over five years.
We employ more than 8,500 security and threat intelligence experts across 77 countries.
Azure has one of the largest compliance certification portfolios in the industry.
A person wearing a denim jacket is using a tablet in a clothing store with various garments hanging on racks in the background.
PRICING

Flexible pricing to meet your needs

Pay for only what you use—no upfront costs. Azure AI Speech pay-as-you-go pricing is based on:
CUSTOMER STORIES

See what customers are building with Azure AI Speech

FAQ

Frequently asked questions

  •  Azure AI Speech offers a number of features and capabilities, including speech to text, text to speech, and speech translation. These are offered through SDKs in several programming languages, including C#, C++, Java, and more.
  • Yes, Azure AI Speech supports OpenAI’s Whisper model, especially for batch transcriptions.
  •  Azure AI Speech supports an ever-growing set of languages. For the current list of supported languages, please refer to this list.
  • Customers are building interesting applications using Azure AI services. Get started with Speech analytics in Azure AI Studio for conversation AI, post-call analytics, video summarization, and more use cases.
Two people are seated at a table, engaged in a discussion while looking at a laptop. The background is orange
 Account signup

Get started with a free account

Start with USD200 Azure credit
A person wearing glasses and a green sweater is focused on using a laptop at a desk with a small plant and a cup of pencils.
 Account signup

Get started with pay-as-you-go pricing

There’s no upfront commitment—cancel anytime.