Microsoft previews neural network text-to-speech

Applying the latest in deep learning innovation, Speech Service, part of Azure Cognitive Services now offers a neural network-powered text-to-speech capability. Access the preview available today.

Neural Text-to-Speech makes the voices of your apps nearly indistinguishable from the voices of people. Use it to make conversations with chatbots and virtual assistants more natural and engaging, to convert digital texts such as e-books into audiobooks and to upgrade in-car navigation systems with natural voice experiences and more.

This release includes significant enhancements since we first revealed Neural Text-to-Speech at Ignite earlier this year.

Enhanced voice quality

The voices sound more robust and natural across a wider variety of user scenarios, achieved by harnessing the following:

A large supervised training with transfer learning across diverse speakers
More features from unsupervised pretraining
Added robust neural model design

Accelerated runtime performance

Runtime performance of the Neural Text-to-Speech engine is near-instantaneous through extensive code optimization with hardware accelerators, applying parallel inference models and model simplifications considering the balance of sound quality and performance. The real-time factor has been improved from the previous version to less than 0.05X, meaning 1 second of audio can be generated in less than 50 milliseconds. Producing the first byte of audio now runs 6 times faster than before.

Greater service availability

Neural Text-to-Speech has since expanded to three datacenters across the US, Europe, and Asia. Wherever you are in the world, you can integrate neural voices with reduced latency overhead.

With these updates, Speech Services Neural Text-to-Speech capability offers the most natural-sounding voice experience for your users in comparison to the traditional and hybrid system approaches.

You can use this capability starting today with two pre-built neural voices in English – meet Jessa and Guy. Hear what they sound like.

Discounts are available during the preview. Visit the Speech Services pricing page for more details.

If you would like to access this capability in Chinese or German, please submit your request.

Microsoft previews neural network text-to-speech

Enhanced voice quality

Accelerated runtime performance

Greater service availability

Microsoft Azure

Frontier models and production agents: Advancing Microsoft Foundry for the agentic era

Meet Brain: The AI system behind Azure reliability

Proving application resilience on Azure with Chaos Studio

Explore Microsoft Foundry

Microsoft previews neural network text-to-speech

Enhanced voice quality

Accelerated runtime performance

Greater service availability

Microsoft Azure

Related posts

Frontier models and production agents: Advancing Microsoft Foundry for the agentic era

Meet Brain: The AI system behind Azure reliability

Proving application resilience on Azure with Chaos Studio

Explore Microsoft Foundry