Speak human, not robot
Use Text to Speech —part of the Speech service— to build apps and services that speak naturally. Bring your solutions to life with dozens of voices in a wide range of languages. Create lifelike voices with the Neural Text to Speech capability built on breakthrough research in speech synthesis technology. Customize models to create a unique voice for your solution and brand.
Enable fluid, natural-sounding speech that matches the stress patterns and intonation of human voices.
Reach global audiences with more than 80 voices and 45 languages and variants.
Build unique, branded voices for your apps, starting from just a few minutes of training data.
Fine-tune voice output for your scenarios by easily adjusting attributes like rate, volume, and pronunciation.
Produce natural-sounding speech
Give your apps a new voice with natural, humanlike intonation and clear articulation. Using deep neural networks, Text to Speech makes the voices of computers expressive and nearly indistinguishable from natural spoken voice.
|The third type, a logarithm of the unsigned fold change, is undoubtedly the most tractable.|
|As the name suggests, the original submarines came from Yugoslavia.|
|This is easy enough if you have an unfinished attic directly above the bathroom.|
|Susan Candiotti reports they've given up their trip.|
|Carol knows my lifestyle.|
|The seagrass fiber is tough, durable, and smooth.|
|Bestimmte Berufsgruppen sind nur noch schwer zu rekrutieren.|
|Sein Gedicht steckt voller Übertreibungen, die für den Schriftsteller allerdings typisch sind.|
|Er organisiert eine Unterstützung der schwächeren durch die stärksten Bundesländer.|
|Tenete conto di un fattore importante.|
|Alcuni prodotti in gran parte sono di buona qualità.|
|Crisi? Vietato rilassarsi, siamo ancora in emergenza.|
Engage global audiences in real time
Convert text to audio in real time, creating fluid conversational experiences. Engage global audiences using more than 80 voices and 45 languages and variants.
|Language||Sample Text||Voice Sample|
|English (US)||An airport spokesman said more than 110 planes were damaged by hail.|
|German (DE)||Der Anstieg der Verbraucherpreise in der Eurozone verlangsamt sich weiter.|
|Spanish (ES)||El alcalde de Santiago convoca a los medios para inaugurar dos semáforos.|
|Turkish (TR)||Tren durduğu sırada vagonun ortasında bir patlama meydana geldi.|
Create a unique brand voice
Build your unique voice without a single line of code, starting from just a few minutes of training audio. Develop a highly realistic, humanlike custom voice by using deep neural network models with the Custom Neural Voice capability, which can be used for real-time scenarios and synthesizing long-form audio content.
|Sample Text||Voice Sample|
Want to start building your own voice model?
Easily tailor audio output
Fine-tune your text to audio output in real time by controlling parameters including speed, pronunciation, pitch, volume, intonation, and pauses. With neural voices, you can adjust the speaking style to express emotions like cheerfulness or empathy, or to fit specific scenarios like chatting, for a casual tone, or newscasting, for a formal tone.Learn more about voice tuning
Deploy anywhere, from the cloud to the edge
Run Text to Speech in the cloud or on premises with containers for scenarios where data security and low latency are paramount. Speech containers now support both standard and custom voices.Learn more about Speech in containers
Security for the enterprise
Microsoft invests over USD 1 billion annually on cyber security research and development.
We employ more than 3,500 security experts who are completely focused on securing your data and privacy.
Azure has more certifications than any other cloud provider. View the comprehensive list.
Get the power, control, and customization you need with flexible pricing
Pay only for what you use, with no upfront costs. With Text to Speech, you pay as you go, based on number of characters you convert to audio.
Guidelines for responsible neural voices
Learn about responsible deployment of synthetic voices
Synthetic voices must be designed in a way that they earn the trust of others. Learn the principles to building synthetic voices that create confidence in your company and services.Read our responsible deployment guidelines
Obtain consent from voice talent
Help voice talent understand how neural Text To Speech works and how it may be used once they complete the audio recording process.Read our disclosure guidance for voice talent
Make sure users understand when they’re hearing a synthetic voice, and voice talent is aware of how their voice will be used.See our disclosure guidelines Learn about our responsible approach
Contact usThe Custom Neural Voice capability is in gated preview. Learn more about the gating process and how to get access here.
Get started with Text to Speech in three steps
Developer resources for Text to Speech
Documentation and tutorial
Get started with Text to Speech.
Take a Pluralsight course that walks you through using Text to Speech.Take the course
Read about preparing data and training your own voice models.
Frequently asked questions about Text to Speech
Standard voices are created using statistical parametric synthesis and concatenation synthesis techniques. These voices are highly intelligible and sound natural and can be used to let your apps speak in more than 45 languages with a wide range of voice options.
Neural voices use deep neural networks to overcome the limits of traditional text-to-speech systems in matching the patterns of stress and intonation in spoken language and in synthesizing units of speech into a computer voice. Standard text-to-speech breaks down prosody into separate steps for linguistic analysis and acoustic prediction that are governed by independent models, which can result in muffled voice synthesis. Our neural capability does prosody prediction and voice synthesis simultaneously, which results in a more fluid and natural-sounding voice.
See the documentation for a full list.
Check the regional availability.