Skip Navigation

Text to Speech

Convert text to speech to create more natural, accessible interfaces

Bring natural voice to your apps

Build apps and services that speak to users naturally. The Text to Speech API — part of Cognitive Services speech services — converts text to audio in near real time, improving accessibility and usability for customers. The API converts text generated by the app into audio that can be played back and saved as a file for later use.

The service speaks to users in multiple languages. Choose from more than 75 voices in over 45 languages or locales, including options for male and female voices, and adjust parameters like speed, pitch, volume, pronunciation, and additional pauses.

See it in action

To see how speech synthesis works, click Play.*

Language Sample Text Sample Voice
English (US) An airport spokesman said more than 110 planes were damaged by hail.
Chinese (CN) 广告收入的比例高达90%以上
Japanese (JP) 皆様のご協力のたまものと
German (DE) Der Anstieg der Verbraucherpreise in der Eurozone verlangsamt sich weiter.
Spanish (ES) El alcalde de Santiago convoca a los medios para inaugurar dos semáforos.
Turkish (ES) Tren durduğu sırada vagonun ortasında bir patlama meydana geldi.

Want to build this?

Text to Speech with custom voice models

Do you need to give your voice agent a unique, recognizable brand voice? The Text to Speech voice customization feature makes it easy to create one-of-a-kind, voice-enabled apps, with no expertise required.

See it in action



Sample Text Sample Voice

Want to start building your own voice model?

Voice models made easy

To customize your voice agent, simply record and upload training data, and the service creates a unique voice font tuned to your recording. Start a proof of concept with a small amount of data. The system scales seamlessly as your data increases, enhancing the natural voice quality.

Consistent and integrated

Custom voice models are fully integrated with other Cognitive Services speech services. No coding is required, and you can easily deploy your customized voice model to the API.

Fast and secure

Through a unique API endpoint and the secure authentication management, you can plug in your voice fonts quickly across all platforms. Your models are under your control.

Explore a Speech Scenario

Intelligent kiosk

Speech services combined with Language Understanding enables apps and users to interact naturally. Use Speech to Text to capture a user’s question, Language Understanding to parse intent and formulate an appropriate reply, and Text to Speech to synthesize the text into a spoken response. Create conversational interfaces for various scenarios like banking, travel, and entertainment.

Commerce chatbot

Bot de chat para operaciones comercialesDe forma conjunta, Azure Bot Service y el servicio Language Understanding permiten a los desarrolladores crear interfaces de conversación para distintos escenarios, como banca, viajes y entretenimiento. Por ejemplo, un recepcionista de hotel puede usar un bot para mejorar las interacciones tradicionales de correo electrónico y llamadas telefónicas mediante la validación de un cliente a través de Azure Active Directory y el uso de Cognitive Services para mejorar el procesamiento contextual de las solicitudes de los clientes mediante texto y voz. Se puede agregar el servicio de reconocimiento de voz para admitir los comandos de voz.1237456
  1. Overview
  2. Flow

Together, the Azure Bot Service and Language Understanding service enable developers to create conversational interfaces for various scenarios like banking, travel, and entertainment. For example, a hotel’s concierge can use a bot to enhance traditional e-mail and phone call interactions by validating a customer via Azure Active Directory and using Cognitive Services to better contextually process customer requests using text and voice. The Speech recognition service can be added to support voice commands.

  1. 1 Customer uses your mobile app
  2. 2 Using Azure AD B2C, the user authenticates
  3. 3 Using the custom Application Bot, user requests information
  4. 4 Cognitive Services helps process the natural language request
  5. 5 Response is reviewed by customer who can refine the question using natural conversation
  6. 6 Once the user is happy with the results, the Application Bot updates the customer’s reservation
  7. 7 Application insights gathers runtime telemetry to help development with Bot performance and usage
"Microsoft Cognitive Services gives us a huge range of opportunities. It's a perfect match for us now and in the future, when we want to add more features to our app."

Jaan Apajalahti, CEO

Explore the Cognitive Services APIs

Computer Vision

Distill actionable information from images


Detect, identify, analyze, organize, and tag faces in photos

Video Indexer

Unlock video insights

Content Moderator

Automated image, text, and video moderation

Custom Vision PREVIEW

Easily customize your own state-of-the-art computer vision models for your unique use case

Text Analytics

Easily evaluate sentiment and topics to understand what users want

Translator Text

Easily conduct machine translation with a simple REST API call

Bing Spell Check

Detect and correct spelling mistakes in your app

Content Moderator

Automated image, text, and video moderation

Language Understanding

Teach your apps to understand commands from your users

Speaker Recognition PREVIEW

Use speech to identify and verify individual speakers

Speech Services

Unified speech services for speech-to-text, text-to-speech and speech translation

QnA Maker

Distill information into conversational, easy-to-navigate answers

Use the Speech Devices SDK to build an ambient device and create a custom wake word

Learn more