Unified speech services for speech-to-text, text-to-speech and speech translation
The unified Speech services provide a wide range of speech recognition and generation capabilities including speech transcription, text-to-speech and speech translation. The Speech service provides a wide range of speech recognition and generation capabilities, including speech transcription, text-to-speech, speech translation and speaker recognition.
Explore pricing options
Apply filters to customise pricing options to your needs.
Prices are estimates only and are not intended as actual price quotes. Actual pricing may vary depending on the type of agreement entered with Microsoft, date of purchase, and the currency exchange rate. Prices are calculated based on US dollars and converted using London closing spot rates that are captured in the two business days prior to the last business day of the previous month end. If the two business days prior to the end of the month autumn on a bank holiday in major markets, the rate setting day is generally the day immediately preceding the two business days. This rate applies to all transactions during the forthcoming month. Sign in to the Azure pricing calculator to see pricing based on your current programme/offer with Microsoft. Contact an Azure sales specialist for more information on pricing or to request a price quote. See frequently asked questions about Azure pricing.
US government entities are eligible to purchase Azure Government services from a licensing solution provider with no upfront financial commitment, or directly through a pay-as-you-go online subscription.
Important—The price in R$ is merely a reference; this is an international transaction and the final price is subject to exchange rates and the inclusion of IOF taxes. An eNF will not be issued.
US government entities are eligible to purchase Azure Government services from a licensing solution provider with no upfront financial commitment, or directly through a pay-as-you-go online subscription.
Important—The price in R$ is merely a reference; this is an international transaction and the final price is subject to exchange rates and the inclusion of IOF taxes. An eNF will not be issued.
Free (F0)
Category | Features | Price |
---|---|---|
Speech to Text (per second billing) |
Standard | 5 audio hours free per month3 |
Custom |
5 audio hours free per month3 Endpoint hosting: 1 model free per month1 |
|
Conversation transcription multi-channel audio PREVIEW | 5 audio hours free per month | |
Text to Speech (per character billing) |
Neural | 0.5 million characters free per month |
Speech Translation (per second billing) |
Standard | 5 audio hours free per month |
Speaker Recognition (per transaction billing) |
Speaker verification2 | 10,000 free transactions per month |
Speaker identification2 | 10,000 free transactions per month | |
Voice Profile Storage | 10,000 free transactions per month |
Pay as You Go: pay only for what you use.
Category | Price | |
---|---|---|
Speech to Text (per second billing) |
Standard |
Real-time Transcription: $- per hour Fast Transcription: $- per hour9 Batch Transcription: $- per hour1 |
Custom |
Real-time Transcription: $- per hour Batch Transcription: $- per hour1 Endpoint hosting: $- per model per hour Customised Speech Training5: $- per compute hour |
|
Enhanced add-on features:
|
Real-time: $- per hour per feature Batch (Continuous Language identification, Diarization): Included in Standard/Customised (no extra charge) |
|
Conversation transcription multi-channel audio PREVIEW | $- per hour2 | |
Speech Translation (per second billing) |
Real-time Speech Translation | $- per audio hour3 |
Video TranslationPreview |
Batch: $- per output video minute Content editing: $- per output video minute Personal Voice: $- per output video minute |
|
Text to Speech8 | Standard Voice |
Neural: $- per 1M characters Neural HD4: $- per 1M characters |
Customised Voice |
Professional Voice:
Synthesis: $- per 1M characters
Voice model training: $- per compute hour, up to $- per training session Endpoint hosting: $- per model per hour |
|
Personal Voice6:
Synthesis: $- per 1M characters
Voice creation: Free Voice profile storage: $- per 1,000 voice profiles per month |
||
Enhanced Add-on feature: Avatar | Standard: $- per minute | |
Custom:
Real-time synthesis: $- per minute
Batch synthesis: $- per minute Endpoint hosting: $- per model per hour |
||
Speaker Recognition (per transaction billing) |
Speaker verification7 | $- per 1,000 transactions |
Speaker identification7 | $- per 1,000 transactions | |
Voice Profile Storage | $- per 1,000 voice profiles (10,000 free voice profiles per month) |
Commitment Tiers – Azure - Standard
Category | Features | Price (per month) | Overage |
---|---|---|---|
Speech-to-Text | Standard | $- for 2,000 hours | $- per hour |
$- for 10,000 hours | $- per hour | ||
$- for 50,000 hours | $- per hour | ||
Custom | $- for 2,000 hours | $- per hour | |
$- for 10,000 hours | $- per hour | ||
$- for 50,000 hours | $- per hour | ||
Enhanced add-on features:2
|
$- for 2,000 hours | $- per hour | |
$- for 10,000 hours | $- per hour | ||
$- for 50,000 hours | $- per hour | ||
Text to Speech | Neural1 | $- for 80M characters | $- per 1M characters |
$- for 400M characters | $- per 1M characters | ||
$- for 2,000M characters | $- per 1M characters |
1Real-time synthesis only, this does not include long audio creation.
2Real-time speech to text only, Continuous Language Identification and Diarization add-on features included with batch speech to text.
Commitment Tiers – Connected container
Category | Features | Price (per month) | Overage |
---|---|---|---|
Speech-to-Text2 | Standard | $- for 2,000 hours | $- per hour |
$- for 10,000 hours | $- per hour | ||
$- for 50,000 hours | $- per hour | ||
Custom | $- for 2,000 hours | $- per hour | |
$- for 10,000 hours | $- per hour | ||
$- for 50,000 hours | $- per hour | ||
Enhanced add-on features:2
|
$- for 2,000 hours | $- per hour | |
$- for 10,000 hours | $- per hour | ||
$- for 50,000 hours | $- per hour | ||
Text to Speech | Neural1 | $- for 80M characters | $- per 1M characters |
$- for 400M characters | $- per 1M characters | ||
$- for 2,000M characters | $- per 1M characters |
1Real-time synthesis only, this does not include long audio creation.
2Pricing applies to real-time and batch use cases. There is no separate batch pricing for containers.
See the documentation for information on Commitment tiers.
Commitment Tiers – Disconnected container
Sign up to access speech in disconnected containers, or learn more
Category | Features | Price (per year) | Max usage (per year) | Projected usage (per month) |
---|---|---|---|---|
Speech-to-Text2 | Standard |
$-
$- Sign up to get access Learn more |
120,000 hours
600,000 hours |
10,000 hours
50,000 hours |
Custom |
$-
$- Sign up to get access Learn more |
120,000 hours
600,000 hours |
10,000 hours
50,000 hours |
|
Enhanced add-on features:
|
$-
$- |
120,000 hours
600,000 hours |
10,000 hours
50,000 hours |
|
Text to Speech | Neural1 |
$-
$- Sign up to get access Learn more |
4.8B characters
24B characters |
400M characters
2,000M characters |
1Real-time synthesis only, this does not include long audio creation.
2Pricing applies to real-time and batch use cases. There is no separate batch pricing for containers.
These features are being deprecated and only available for existing customers to use. Check details and learn how to migrate to new features.
Instance | Category | Features | Price |
---|---|---|---|
Free - Web/Container 1 concurrent request |
Text to Speech | Standard | 5 million characters free per month |
Custom |
5 million characters free per month Endpoint hosting: 1 model free per month |
||
Standard - Web/Container 100 concurrent requests for Base model 20 concurrent requests for Custom model |
Text to Speech | Standard | $- per 1M characters |
Custom |
$- per 1M characters Endpoint hosting: $- per model per hour |
Azure pricing and purchasing options
Connect with us directly
Get a walkthrough of Azure pricing. Understand pricing for your cloud solution, learn about cost optimisation and request a customised proposal.
Talk to a sales specialistSee ways to purchase
Purchase Azure services through the Azure website, a Microsoft representative or an Azure partner.
Explore your optionsAdditional resources
Azure AI Speech
Learn more about Azure AI Speech features and capabilities.
Pricing calculator
Estimate your expected monthly costs for using any combination of Azure products.
Documentation
Review technical tutorials, videos, and more Azure AI Speech resources.
Frequently asked questions
-
- For Speech to Text and Speech Translation, usage is billed in one-second increments.
- For Text to Speech: usage is billed per character. Check the definition of character in the pricing note.
- For customised neural voice hosting: usage is billed per endpoint per second. Check details in the pricing note.
- For personal voice profile storage: usage is billed per voice profile per day. Check details in the pricing note.
- For Text to Speech Avatar, usage is billed per second.
- For Speech to Text and Text to Speech (including Avatar), endpoint hosting for customised models is billed per second per model.
-
The Speech service enables users to adapt baseline models based on their own acoustic and language data, leading to custom speech models that can be used against both Speech to Text and Speech Translation.
-
The language model is a probability distribution over sequences of words. The language model helps the system to decide among sequences of words that sound similar, based on the likelihood of the word sequences themselves. For example, “recognize speech” and “wreck a nice beach” sound alike but the first hypothesis is far more likely to occur, and therefore will be assigned a higher score by the language model. If you expect voice queries to your application to contain particular vocabulary items, such as product names or jargon that rarely occur in typical speech, it is likely that you can obtain improved performance by customising the language model. For example, if you were building an app to search MSDN by voice, it’s likely that terms like “object-oriented”, “namespace” or “dot net” will appear more frequently than in typical voice applications. Customising the language model will enable the system to learn this.
-
The acoustic model is a classifier that labels short fragments of audio into one of several phonemes, or sound units, in each language. These phonemes can then be stitched together to form words. For example, the word “speech” is comprised of four phonemes “s p iy ch”. These classifications are made on the order of 100 times per second. Customising the acoustic model can enable the system to learn to do a better job recognising speech in atypical environments. For example, if you have an app designed to be used by workers in a warehouse or factory, a customised acoustic model can more accurately recognise speech in the presence of the noises found in these environments.
-
Speech service offers a wide range of text-to-speech (TTS) voice fonts; however, custom neural voice allows you to build your own custom voice that suits your needs and your brand. Read the blog for more information.
-
Language identification allows you to identify a switch in spoken language and transcribe speech accordingly. This can be applied in scenarios where the audio language is unknown, or when speaker(s) may speak multiple languages. Single Language Identification is available at no additional cost. Continuous Language Identification is an enhanced add-on feature. Visit docs to learn more.
-
- Pronunciation assessment evaluates speech pronunciation and gives speakers feedback on the accuracy and fluency of spoken audio. With pronunciation assessment, language learners can practise, get instant feedback, and improve their pronunciation so that they can speak and present with confidence. Educators can use the capability to evaluate pronunciation of multiple speakers in real time. Visit docs to learn more.
- It is charged as standard Speech to Text, example:
For evaluation of 8 seconds of speech, you will be charged around $-
Talk to a sales specialist for a walk-through of Azure pricing. Understand pricing for your cloud solution.
Get free cloud services and a $200 credit to explore Azure for 30 days.