Text to Speech

A Speech service feature that converts text to lifelike speech

Bring your apps to life with natural-sounding voices

Build apps and services that speak naturally, choosing from more than 100 voices in over 40 languages. Differentiate your brand with a customized voice, and access voices with different speaking styles and emotional tones to fit your use case—all in your preferred programming language.

Lifelike speech

Enable fluid, natural-sounding speech that matches the patterns and intonation of human voices.

Customizable voices

Create a unique voice that reflects your brand’s identity.

Fine-grained audio controls

Tune voice output for your scenarios by easily adjusting rate, pitch, pronunciation, pauses, and more.

Flexible deployment

Run Text to Speech anywhere—in the cloud or at the edge in containers.

Access a wide variety of voices for every scenario

Engage global audiences by using more than 100 voices and over 40 languages and variants. Bring your scenarios to life with highly expressive and humanlike voices. Neural Text to Speech supports several speaking styles, including chat, newscast, and customer service, and emotions like cheerfulness and empathy.

English (US): Aria

Sentence Voice Sample
The third type, a logarithm of the unsigned fold change, is undoubtedly the most tractable.
As the name suggests, the original submarines came from Yugoslavia.
This is easy enough if you have an unfinished attic directly above the bathroom.

English (US): Guy

Sentence Voice Sample
Susan Candiotti reports they've given up their trip.
Carol knows my lifestyle.
The seagrass fiber is tough, durable, and smooth.

Chinese (CN): Xiaoxiao

Sentence Voice Sample
您好,欢迎致电客服中心。我是华北地区的客服人员,工号0165。请问有什么可以帮您?
想和你表白,试了一万种方式,找了一千次时机,但都放弃了,最终只能原地踏步。
负责人Michael透露,新推出的紧凑型SUV搭载了智能的音响系统,可以语音控制volume大小。不过,车身的整体造型还是个secret。

German (DE): Katja

Sentence Voice Sample
Bestimmte Berufsgruppen sind nur noch schwer zu rekrutieren.
Sein Gedicht steckt voller Übertreibungen, die für den Schriftsteller allerdings typisch sind.
Er organisiert eine Unterstützung der schwächeren durch die stärksten Bundesländer.

Italian (IT): Elsa

Sentence Voice Sample
Tenete conto di un fattore importante.
Alcuni prodotti in gran parte sono di buona qualità.
Crisi? Vietato rilassarsi, siamo ancora in emergenza.

Want to build this?

Build a custom voice for your brand

Differentiate your brand with a unique custom voice. Develop a highly realistic voice for more natural conversational interfaces using the custom neural voice capability (preview), starting with 30 minutes of audio.

Language

Quality

Sample Text Voice Sample

Want to start building your own voice model?

Tailor your speech output

Fine-tune audio to fit your scenario. Define lexicons and control speech parameters such as pronunciation, pitch, rate, pauses, and intonation with Speech Synthesis Markup Language (SSML) or with the audio content creation tool.

Deploy anywhere, from the cloud to the edge

Run Text to Speech wherever your data resides. Build speech applications that are optimized for both robust cloud capabilities and edge locality using containers (preview). Speech containers support both standard and custom voice.

Comprehensive privacy and security

  • The Speech service, part of Azure Cognitive Services, is certified by SOC, FedRAMP, PCI DSS, HIPAA, HITECH, and ISO.
  • Your data remains yours. Your text data isn’t stored during data processing or audio generation.
  • View and delete your custom voice data and models at any time. Your data is encrypted while it’s in storage.
  • Backed by Azure infrastructure, the Speech service offers enterprise-grade security, availability, compliance, and manageability.

Flexible pricing gives you the power and control you need

Pay only for what you use, with no upfront costs. With Text to Speech, you pay as you go based on the number of characters you convert to audio.

Guidelines for building responsible synthetic voices

Documentation and resources

Get started

Explore code samples

See customization resources

Get started with Speech