Announcing new voices and emotions to Azure Neural Text to Speech

Azure Neural Text to Speech (TTS), a powerful speech synthesis capability of Azure Cognitive Services, enables developers to convert text to lifelike speech using AI. Enterprises and agencies utilize Azure Neural TTS for video game characters, chatbots, content readers, and more. The Azure TTS product team is continuously working on bringing new voice styles and emotions to the US market and beyond.

Azure Neural Text to Speech (Azure Neural TTS), a powerful speech synthesis capability of Azure Cognitive Services, enables developers to convert text to lifelike speech using AI. Enterprises and agencies utilize Azure Neural TTS for video game characters, chatbots, content readers, and more. The Azure Neural TTS product team is continuously working on bringing new voice styles and emotions to the US market and beyond.

New voice styles and emotional tones

We received feedback from customers that more voice options would help them better apply Azure Neural TTS to different user scenarios. In addition, supporting voice emotions and voice styles would help deliver the most engaging experience to end-users. With that feedback, we decided to add five new neural voices in US-English, expanding from 15 to 20. This includes two female voices—Jane and Nancy—and three male voices—Davis, Jason, and Tony. We also expanded to eight emotional tones for many of our existing and new voices, including cheerful, angry, sad, excited, hopeful, friendly, unfriendly, and terrified. Finally, to improve spatial experiences, we added shouting and whispering.

Listen to how they sound

New voices

Voices	Gender	Sample
Jane	Female	Audio
Davis	Male	Audio
Jason	Male	Audio
Nancy	Female	Audio
Tony	Male	Audio

New emotions

Style	Sample (male)	Same (female)
Excited	Audio	Audio
Hopeful	Audio	Audio
Friendly	Audio	Audio
Unfriendly	Audio	Audio
Terrified	Audio	Audio

New ways to project

Style or emotion	Sample (male)	Sample (female)
Shouting	Audio	Audio
Whispering	Audio	Audio

We encourage you to try the new voices and emotions. Feedback is encouraged to help inform which voices will be made for General Availability in all regions, depending on customer satisfaction. “By supporting more voice options and expanding voice styles, Azure Speech continues to address the unmet needs of the customers to build more delightful speech experience,” said Binggong Ding, Principal Group Product Manager of the Microsoft Speech team.

See the full list of US-English voices here.

Three ways customers are using this

Content reading is a popular use case for AI customers using Azure Neural TTS. Microsoft has plugins to enable Read Aloud across the web. This use case also supports improved accessibility for customers with vision challenges. The new voice style, supported by ten different emotional tones creates endless possibilities for improving the customer experience. Scaling character voice production is accelerated by Azure Neural TTS. Video game characters with lifelike voices can be trained quickly to bring your virtual worlds to life and delight gamers. Emotional tones for being terrified and friendly help add more personality to the game experiences. Long gone are the days of frustrating voice assistants and chatbots, as now you can deliver lifelike conversational experiences. Call centers can scale operations while also improving customer satisfaction.

Featured customers

Undead Labs is on a mission to take gaming in bold new directions. They are the makers of the State of Decay franchise and use Azure Neural TTS during game development. Double Fine, who has produced many popular games, including Psychonauts 2, is utilizing our neural TTS to prototype future game projects. Remixd (recently acquired by Global) uses Azure Neural TTS including Jenny and Davis voices for one of its music radio media clients.

International reach

Engage global audiences by using more than 340 neural voices across 129 languages and variants. Bring your scenarios like text readers and voice-enabled assistants to life with highly expressive and human-like voices.

Neural TTS and Responsible AI

We are excited about the future of Azure Neural TTS with human-like, diverse and delightful quality under the high-level architecture of XYZ-Code AI framework. Our technology advancements are also guided by Microsoft’s Responsible AI process, and our principles of fairness, inclusiveness, reliability and safety, transparency, privacy and security, and accountability. We put these ethical standards into practice through the Office of Responsible AI (ORA), which sets our rules and governance processes, the AI Ethics and Effects in Engineering and Research (Aether) Committee, which advises our leadership on the challenges and opportunities presented by AI innovations, and Responsible AI Strategy in Engineering (RAISE), a team that enables the implementation of Microsoft Responsible AI rules across engineering groups.

Get started

Start building new customer experiences with Azure Neural TTS. In addition, the Custom Neural Voice capability enables organizations to create a unique brand voice in multiple languages and styles.

Announcing new voices and emotions to Azure Neural Text to Speech

New voice styles and emotional tones