Skip Navigation

Cognitive Services pricing—Speech Services

Use intelligence APIs to enable vision, speech, language, and knowledge capabilities

The unified Speech services provide a wide range of speech recognition and generation capabilities including speech transcription, text-to-speech and speech translation.

The below pricing reflects a Preview discount and goes into effect on June 1, 2018. As a limited time, promotion, usage prior to June 1, 2018 will not be charged.

Pricing Details

Category Feature Free Tier S1 Tier
Speech Translation Speech Translation 5 hours per month $- per hour
Speech to Text Speech to Text 5 hours per month $- per hour
Speech to Text with Custom Speech Model 5 hours per month $- per hour
Custom Speech Model Hosting 1 model $-/model/month
Text to Speech Text to Speech 5M characters per month $- per 1M chars
Text to Speech with Custom Voice Font 5M characters per month $- per 1M chars
Custom Voice Font Hosting 1 model $-/model/month

Support & SLA

  • Free billing and subscription management support are included.
  • Need tech support for preview services? Use our forums.
  • We guarantee that Cognitive Services running in the standard tier will be available at least 99.9 percent of the time. No SLA is provided for the free tier. Read the SLA.
  • No SLA during preview period. Learn more.


Speech Services

    • For Speech Translation, Speech to Text, and Speech to Text with Custom Speech Model: usage is billed in one-second increments
    • For Text to Speech and Text To Speech with Custom Voice Font: usage is billed per character
    • For Custom Speech Model Hosting and Custom Voice Font Hosting: usage is billed daily
  • The Speech service enables users to adapt baseline models based on their own acoustic and language data, leading to custom speech models that can be used against both Speech to Text and Speech Translation.

  • The language model is a probability distribution over sequences of words. The language model helps the system decide among sequences of words that sound similar, based on the likelihood of the word sequences themselves. For example, “recognize speech” and “wreck a nice beach” sound alike but the first hypothesis is far more likely to occur, and therefore will be assigned a higher score by the language model. If you expect voice queries to your application to contain particular vocabulary items, such as product names or jargon that rarely occur in typical speech, it is likely that you can obtain improved performance by customizing the language model. For example, if you were building an app to search MSDN by voice, it’s likely that terms like “object-oriented” or “namespace” or “dot net” will appear more frequently than in typical voice applications. Customizing the language model will enable the system to learn this.

  • The acoustic model is a classifier that labels short fragments of audio into one of several phonemes, or sound units, in each language. These phonemes can then be stitched together to form words. For example, the word “speech” is comprised of four phonemes “s p iy ch”. These classifications are made on the order of 100 times per second. Customizing the acoustic model can enable the system to learn to do a better job recognizing speech in atypical environments. For example, if you have an app designed to be used by workers in a warehouse or factory, a customized acoustic model can more accurately recognize speech in the presence of the noises found in these environments.

  • Microsoft Speech Services provide 70+ default voices (a.k.a voice fonts) in 40+ languages to help you convert your text into audio. With the rise of the virtual assistant and various speech-enabled applications, however, many companies would like to have a unique voice that represents their business and is carefully designed for their own brand identity. For example, if you are developing a chat bot for your customer care service, you can associate it with a unique brand voice of your company to develop customer attachment. Likewise, an in-car navigation software developer can enable Text-to-Speech in different custom voices to enrich user experience.

    Through Voice Studio, the custom voice building portal, that is easy. Using your own audio data (recorded human voice with their associated scripts), you can generate a custom voice font which will then be deployed to Microsoft Text-to-Speech service and can be easily plugged in your applications with an API endpoint for your own use.


  • The Emotion API, Face API, Language Understanding Intelligent Service API, Bing Speech-to-Text API, and Bing Text-to-Speech API are billed per 1,000 API transaction calls when a production API call is being actively executed. Billing is prorated for production API transaction call quantities.

    The Bing Long Form Speech API service is billed per hour of speech that is analyzed. The billing is prorated on a per-minute basis.

    The Recommendations API and Text Analytics API can be purchased in units of the standard tiers at a fixed price. Each unit of a tier comes with included quantities of API transactions. If the user exceeds the included quantities, overages are charged at the rate specified in the pricing table above. These overages are prorated, and the service is billed on a monthly basis. The included quantities in a tier are reset each month.

  • Usage is throttled if the transaction limit is reached on the free tier. Customers can't accrue overages on the free tier.

  • Any annotation to a document counts as a transaction. Batch scoring calls will also take into consideration the number of documents that need to be scored in that transaction. So for instance, if 1,000 documents are sent for sentiment analysis in a single API call, that will count for 1,000 transactions. If an API supports more than one annotation operation, that will also be considered. Let’s say an API call performs both sentiment analysis and key-phrase extraction on 1,000 documents, that will count for 2,000 transactions (2 annotations × 1,000 documents).

  • If the usage on a standard tier is exceeded, the account starts to accrue overages. These overages are billed on a monthly basis, and are calculated at the rate specified for each tier.

  • You may upgrade to a higher tier at any time. The billing rate and included quantities corresponding to the higher tier will begin immediately.


Estimate your monthly costs for Azure services

Review Azure pricing frequently asked questions

Learn more about Cognitive Services

Review technical tutorials, videos, and more resources

Added to estimate. Press 'v' to view on calculator View on calculator

Learn and build with $200 in credit, and keep going for free

Free account