Accelerate the in-vehicle digital experience with Azure Cognitive Services | Azure Blog

Microsoft is helping to reshape the automotive industry in the way it serves its drivers with in-vehicle infotainment systems. Together with the car manufacturers, Microsoft is creating new driving experiences with speech based on the text-to-speech and speech-to-text capabilities within Azure Cognitive Services for speech.

Microsoft is helping to reshape the automotive industry in the way it serves its drivers with in-vehicle infotainment systems. As an example, Azure is partnering with XPENG to enable AI voice experiences for automotive brands and customers. The solution provides the industry with a fresh take on text-to-speech and expressive voice, global languages, speaker fidelity, and self-service customization. XPENG joins a growing trend of automakers rethinking investments in environmental voice.

“This is a cutting-edge exploration of vehicle voice interaction in the auto industry,” XPENG automotive AI product senior expert Hao Chao said. “The experience delivers a whole new level of natural speech. With a deep understanding of urban mobility, we are finding many more scenarios to leverage AI technology for a high level of driver-machine intuition.”

XPENG tapped into Microsoft’s neural text-to-speech technology for their in-car user experience. By using Microsoft’s neural text-to-speech with emotional styles, XPENG can provide a more delightful listening experience for their customers and combat listening fatigue. Microsoft’s neural text-to-speech provides fluency and naturalness that is comparable to a human voice. Coupled with multi-emotional voices, Microsoft text-to-speech acts as a refreshing replacement to the monotonous sound many car assistants have today.

“We are excited to reimagine how speech and voice can improve the lives of drivers,” Azure AI Speech Product Lead Binggong Ding said. “While from a technical point of view, we really want to make this a model that can serve all auto brands and their developers. How can we best optimize the use of synthetic speech to enable a high-fidelity voice experience without compromising sound quality? XPENG is building upon this challenge to provide a voice assistant that customers have been looking for.”

Microsoft’s long-term goal is to make advanced multi-emotional, global voice capabilities the new standard for global car brands and consumers. The technology adopted by XPENG added dozens of voice styles, unique emotional intensity control, and deduction abilities. It covers 90 certifications worldwide including domestic policies, regulatory data center requirement and EU GDPR, and higher data privacy-policy holder requirements. Together with the car manufacturers, Microsoft is creating new driving experiences with speech based on the text-to-speech and speech-to-text capabilities within Azure Cognitive Services for speech.

Accelerated speech innovation

Voice is the new interface in ambient computing technology. The quality of text-to-speech and speech-to-text has improved in recent years due to research and technological leaps enabled by the development of neural networks. High-quality speech-to-text and text-to-speech fulfill the needs of the automaker to create the next generation modern in-car speech experience. Microsoft speech-to-text offers robust recognition capabilities which are speaker-independent and capable of handling ambient noise while driving. Microsoft text-to-speech also features a more fluid, natural-sounding voice which can be a differentiation for automakers and customers alike. Both speech-to-text and text-to-speech also increase hands-free control of the car infotainment system. Microsoft text-to-speech supports several speaking styles, including chat, newscast, and customer service. These advancements allow drivers to have a more delightful driving experience. For more information about the recent advancements in speech-to-text and text-to-speech check out speech-to-text with its research results, reaching human parity on the Switchboard research benchmark and neural-text-to-speech is close to human-parity.

Offering global languages

Microsoft helps automakers cover their global business and just recently hit a milestone of 100 languages and now supports 119 languages and variants with 278 voices out-of-box. This is aligned with our company vision to empower every person and organization on the planet to achieve more. “One hundred languages is a good milestone for us to achieve our ambition for everyone to be able to communicate regardless of the language they speak,” said Xuedong Huang, Microsoft Technical Fellow and Azure AI Chief Technology Officer. With more languages with their variants covered, we’re excited to be powering natural and intuitive voice experiences for automakers.

Differentiation with customization

Microsoft empowers automakers to develop a highly realistic branded voice for more natural conversational interfaces using the custom neural voice capability. Based on the neural text-to-speech technology and the multi-lingual multi-speaker universal model, custom neural voice lets you create synthetic voices that are rich in speaking styles or adaptable cross languages with as little as 30 minutes of audio. The realistic and natural-sounding voice of custom neural voice can represent brands and specific personas and allow users to interact with applications naturally in a conversational style. Check out this blog for a step-by-step guide on how to create a custom neural voice.

Compliance and responsible AI

Microsoft is committed to investing in meeting regulatory standards around the globe to meet the automakers’ compliance requirements. The speech service, part of Azure Cognitive Services, is certified by SOC, FedRAMP, PCI DSS, HIPAA, HITECH, and ISO. Backed by Azure infrastructure, the speech service also offers enterprise-grade security, availability, compliance, and manageability.

Microsoft is committed to developing AI technology in a responsible way. We use different technical and policy features to safeguard against misuse of the technology. For example, we are designing and releasing Custom Neural Voice with the intention of protecting the rights of individuals and society, fostering transparent human-computer interaction, and counteracting the proliferation of harmful deepfakes and misleading content. This aligns with Microsoft’s commitment to responsible AI. That commitment includes Transparency Notes, which communicates the purpose, capabilities, and limitations of an AI system.

Learn more

Azure Cognitive Services brings AI within reach. Learn how you accelerate innovation with breakthrough AI research.