Převod textu na řeč

Convert text to lifelike speech for more natural interfaces

Mluvte jako člověk, ne jako robot

Use Text to Speech —part of the Speech service— to build apps and services that speak naturally. Bring your solutions to life with dozens of voices in a wide range of languages. Create lifelike voices with the Neural Text to Speech capability built on breakthrough research in speech synthesis technology. Customize models to create a unique voice for your solution and brand.

Lifelike speech

Enable fluid, natural-sounding speech that matches the stress patterns and intonation of human voices.

Global engagement

Reach global audiences with more than 80 voices and 45 languages and variants.

Customized experiences

Build unique, branded voices for your apps, starting from just a few minutes of training data.

Optimized audio

Fine-tune voice output for your scenarios by easily adjusting attributes like rate, volume, and pronunciation.

Produce natural-sounding speech

Give your apps a new voice with natural, humanlike intonation and clear articulation. Using deep neural networks, Text to Speech makes the voices of computers expressive and nearly indistinguishable from natural spoken voice.

Angličtina (USA): Jessa

Věta Ukázka hlasu
The third type, a logarithm of the unsigned fold change, is undoubtedly the most tractable.
As the name suggests, the original submarines came from Yugoslavia.
This is easy enough if you have an unfinished attic directly above the bathroom.

Angličtina (USA): Guy

Věta Ukázka hlasu
Susan Candiotti reports they've given up their trip.
Carol knows my lifestyle.
The seagrass fiber is tough, durable, and smooth.

Čínština (CN): Xiaoxiao

Věta Ukázka hlasu
您好,欢迎致电客服中心。我是华北地区的客服人员,工号0165。请问有什么可以帮您?
想和你表白,试了一万种方式,找了一千次时机,但都放弃了,最终只能原地踏步。
负责人Michael透露,新推出的紧凑型SUV搭载了智能的音响系统,可以语音控制volume大小。不过,车身的整体造型还是个secret。

Němčina (DE): Katja

Věta Ukázka hlasu
Bestimmte Berufsgruppen sind nur noch schwer zu rekrutieren.
Sein Gedicht steckt voller Übertreibungen, die für den Schriftsteller allerdings typisch sind.
Er organisiert eine Unterstützung der schwächeren durch die stärksten Bundesländer.

Italština (IT): Elsa

Věta Ukázka hlasu
Tenete conto di un fattore importante.
Alcuni prodotti in gran parte sono di buona qualità.
Crisi? Vietato rilassarsi, siamo ancora in emergenza.

Chcete na tom stavět?

Engage global audiences in real time

Convert text to audio in real time, creating fluid conversational experiences. Engage global audiences using more than 80 voices and 45 languages and variants.

Jazyk Ukázkový text Ukázka hlasu
English (US) An airport spokesman said more than 110 planes were damaged by hail.
Chinese (CN) 广告收入的比例高达90%以上
Japanese (JP) 皆様のご協力のたまものと
German (DE) Der Anstieg der Verbraucherpreise in der Eurozone verlangsamt sich weiter.
Spanish (ES) El alcalde de Santiago convoca a los medios para inaugurar dos semáforos.
Turkish (ES) Tren durduğu sırada vagonun ortasında bir patlama meydana geldi.

Chcete na tom stavět?

Create a unique brand voice

Build your unique voice without a single line of code, starting from just a few minutes of training audio. Develop a highly realistic, humanlike custom voice by using deep neural network models with the Custom Neural Voice capability, which can be used for real-time scenarios and synthesizing long-form audio content.

Jazyk

Kvalita

Ukázkový text Ukázka hlasu

Chcete začít vytvářet vlastní hlasový model?

Easily tailor audio output

Fine-tune your text to audio output in real time by controlling parameters including speed, pronunciation, pitch, volume, intonation, and pauses. With neural voices, you can adjust the speaking style to express emotions like cheerfulness or empathy, or to fit specific scenarios like chatting, for a casual tone, or newscasting, for a formal tone.

Learn more about voice tuning

Deploy anywhere, from the cloud to the edge

Run Text to Speech in the cloud or on premises with containers for scenarios where data security and low latency are paramount. Speech containers now support both standard and custom voices.

Learn more about Speech in containers

Security for the enterprise

  • Microsoft invests over USD 1 billion annually on cyber security research and development.

  • We employ more than 3,500 security experts who are completely focused on securing your data and privacy.

  • Azure has more certifications than any other cloud provider. View the comprehensive list.

Get the power, control, and customization you need with flexible pricing

Pay only for what you use, with no upfront costs. With Text to Speech, you pay as you go, based on number of characters you convert to audio.

Guidelines for responsible neural voices

Learn about responsible deployment of synthetic voices

Synthetic voices must be designed in a way that they earn the trust of others. Learn the principles to building synthetic voices that create confidence in your company and services.

Read our responsible deployment guidelines

Obtain consent from voice talent

Help voice talent understand how neural Text To Speech works and how it may be used once they complete the audio recording process.

Read our disclosure guidance for voice talent

Be transparent

Make sure users understand when they’re hearing a synthetic voice, and voice talent is aware of how their voice will be used.

See our disclosure guidelines Learn about our responsible approach

Contact us

The Custom Neural Voice capability is in gated preview. Learn more about the gating process and how to get access here.

Get started with Text to Speech in three steps

Get instant access and a $200 credit by signing up for an Azure free account.
Sign into the Azure portal and add Speech.
Learn how to embed Text to Speech from the quickstarts and documentation.

Developer resources for Text to Speech

Documentation and tutorial

Get started with Text to Speech.

Courses

Take a Pluralsight course that walks you through using Text to Speech.

Take the course

Frequently asked questions about Text to Speech

  • Standard voices are created using statistical parametric synthesis and concatenation synthesis techniques. These voices are highly intelligible and sound natural and can be used to let your apps speak in more than 45 languages with a wide range of voice options.

    Neural voices use deep neural networks to overcome the limits of traditional text-to-speech systems in matching the patterns of stress and intonation in spoken language and in synthesizing units of speech into a computer voice. Standard text-to-speech breaks down prosody into separate steps for linguistic analysis and acoustic prediction that are governed by independent models, which can result in muffled voice synthesis. Our neural capability does prosody prediction and voice synthesis simultaneously, which results in a more fluid and natural-sounding voice.
  • See the documentation for a full list.
  • Check the regional availability.

Get Started with Speech