• 1 min read

Microsoft previews neural network text-to-speech

Applying the latest in deep learning innovation, Speech Service, part of Azure Cognitive Services now offers a neural network-powered text-to-speech capability. Access the preview available today.

Applying the latest in deep learning innovation, Speech Service, part of Azure Cognitive Services now offers a neural network-powered text-to-speech capability. Access the preview available today.

Neural Text-to-Speech makes the voices of your apps nearly indistinguishable from the voices of people. Use it to make conversations with chatbots and virtual assistants more natural and engaging, to convert digital texts such as e-books into audiobooks and to upgrade in-car navigation systems with natural voice experiences and more.

This release includes significant enhancements since we first revealed Neural Text-to-Speech at Ignite earlier this year.

Enhanced voice quality

The voices sound more robust and natural across a wider variety of user scenarios, achieved by harnessing the following:

  • A large supervised training with transfer learning across diverse speakers
  • More features from unsupervised pretraining
  • Added robust neural model design 

Accelerated runtime performance

Runtime performance of the Neural Text-to-Speech engine is near-instantaneous through extensive code optimization with hardware accelerators, applying parallel inference models and model simplifications considering the balance of sound quality and performance. The real-time factor has been improved from the previous version to less than 0.05X, meaning 1 second of audio can be generated in less than 50 milliseconds. Producing the first byte of audio now runs 6 times faster than before.

Greater service availability

Neural Text-to-Speech has since expanded to three datacenters across the US, Europe, and Asia. Wherever you are in the world, you can integrate neural voices with reduced latency overhead.

 

With these updates, Speech Services Neural Text-to-Speech capability offers the most natural-sounding voice experience for your users in comparison to the traditional and hybrid system approaches.

You can use this capability starting today with two pre-built neural voices in English – meet Jessa and Guy. Hear what they sound like.

Discounts are available during the preview. Visit the Speech Services pricing page for more details.

If you would like to access this capability in Chinese or German, please submit your request.