Skip to main content

The Microsoft Azure Cognitive Speech Services platform is a comprehensive collection of technologies and services aimed at accelerating the incorporation of speech into applications and amplifying differentiation to the market as a result. Among the services available are Speech to Text, Text to Speech, custom neural voice (CNV) Conversation Transcription Service, Speaker Recognition, Speech Translation, Speech SDK, and Speech Device Development Kit (DDK).

AI for education is an emerging technology that has the potential to revolutionize the way we teach and learn languages. One of the most important aspects of language learning is the ability to pronounce words accurately, and this is where Azure Cognitive Speech Service's new Pronunciation Assessment feature comes in. Another key opportunity is the development of synthetic bilingual voices for language learning experiences with Custom Neural Voice, in addition to our speech-to-text capabilities.

1. Pronunciation Assessment

The new feature is designed to provide instant feedback to users on the accuracy, fluency, and prosody of their speech when learning a new language. The service utilizes Azure Neural Text-to-Speech and Transformer models, along with ordinal regression and a hierarchical structure, to improve the accuracy of word-level assessment. The service is currently available in more than 10 languages, including American English, British English, Australian English, French, Spanish, and Chinese, with additional languages in preview.

The Pronunciation Assessment feature offers several benefits for educators, service providers, and students:

  • For educators, it provides instant feedback, eliminates the need for time-consuming oral language assessments, and offers consistent and comprehensive assessments.
  • For service providers, it offers high real-time capabilities, worldwide speech cognitive service, and supports growing global business.
  • For students and learners, it provides a convenient way to practice and receive feedback, authoritative scoring to compare with native pronunciation, and helps to follow the exact text order for long sentences or full documents.

Pronunciation Assessment is a powerful tool for language learning and teaching. By leveraging AI technologies such as TTS, Transformer, and Ordinal Regression, it provides instant and accurate feedback on speech pronunciation. With its wide range of supported languages and its ability to work with low-resource locales, it offers language learners of all backgrounds the opportunity to improve their language skills. With Pronunciation Assessment, educators can offer a more engaging and accessible learning experience, service providers can improve education customers' productivity, and students can practice more conveniently anywhere and anytime.

At the Microsoft Reimagine Education event on February 9, 2023, we announced several new features to support student success. Speech Pronunciation assessment is used in Reading Coach on Immersive Reader and the Speaker Progress in Microsoft Teams. It can be used inside and outside of the classroom to save teachers time and improve learning outcomes for students on reading fluency, accessible to all learners.

2. Speech-to-Text

Teachers and language learners naturally will mix native language and learning language during the learning conversation. Azure Speech to text supports real-time language identification for multilingual language learning scenarios, and helps human-human interaction with better understanding and readable context.

The latest multilingual modeling technology and transfer learning techniques were used to develop new speech-to-text (STT) languages based on vast amounts of data. These models have been trained in acoustics and language knowledge across different languages, and can handle both dictation and conversation in a variety of language domains. The output includes Inverse Text Normalization (ITN), capitalization (when appropriate), and automatic punctuation to enhance readability. Developers can easily integrate these languages into their projects using either a real-time streaming application programming interface (API) or batch transcription. The benefits of using a unified model across all languages will be immediately apparent.

3. Prebuilt and Custom Neural Voice (CNV)

Neural voice (Text-to-Speech) can read out learning materials natively and empower self-served learning anytime anywhere. Microsoft Azure AI provides more than 449 prebuilt neural voices across 147 languages and variances to enable users for AI teacher, content read-aloud capabilities, and more.

Custom Neural Voice (CNV) is a feature offered by Azure AI that enables users to create a unique, customized, synthetic voice for their applications. This feature uses human speech samples as training data to generate a highly natural-sounding voice for a brand or characters. Education companies are using this technology to personalize language learning, by creating unique characters with distinct voices that match the culture and background of their target audience. For example, Duolingo used Custom Neural Voice to help bring nine new characters to life within the language learning platform, and Pearson used it to improve pronunciation assessment. CNV is based on neural text-to-speech technology and allows users to create synthetic voices that are rich in speaking styles, cross languages, and adaptable. The realistic and natural-sounding voice is great for representing brands and personifying machines for conversational interactions with users.

Customer Inspiration

As technology continues to advance, it's becoming increasingly clear that the future of education lies in the integration of AI. Azure AI is at the forefront of this revolution, providing education companies with powerful tools to improve the learning experience and drive student engagement and achievement. We are inspired by five customers in the education space:

  1. Pearson: The company wanted to use AI to deliver better services to students and empower teachers with highly accurate assessments, using Azure to develop AI-based services for language learners. They adopted new Microsoft algorithms and a leading-edge pronunciation assessment feature, which is a part of the Speech to Text capability.
  2. Beijing Hongdandan Visually Impaired Service Center: The organization is working with Microsoft and a team of volunteers to generate AI audio content, which will be used to improve resources for people who are blind or have low vision. They used Azure Custom Neural Voice, a text-to-speech tool that allows users to create custom voice fonts, to generate the audio content.
  3. Duolingo: The language learning company is using Custom Neural Voice to personalize language learning by introducing a cast of characters within the platform. Duolingo went through hundreds of iterations of characters, aimed for them to reflect the user base of cultures around the world while aligning visually with the app's longstanding main character. They used Custom Neural Voice to bring the characters to life within the language learning platform. They also used Azure to help bring nine new characters to life within the language learning platform.
  4. HelloTalk: The innovative mobile app provides an enjoyable and effortless way to learn a new language by connecting users with native speakers from around the world. With its intuitive language tools, including its Pronunciation Assessment feature, and community features, it enables users to practice and immerse themselves in the culture of their target language, improve their pronunciation, and make new friends in the process.
  5. Berlitz: The global leadership and language training company provides language learning products that use Azure speech recognition and pronunciation assessment. Through these innovate tools learners instantly receive detailed feedback on the accuracy and fluency of their speech in the new language. This allows Berlitz learners the flexibility to practice and perfect their pronunciation anywhere, anytime before speaking with native speakers in English, German, Spanish, and more.

The future impact of AI in education

The integration of AI, specifically speech services, into the education sector is becoming increasingly important as it can greatly enhance the learning experience and improve the effectiveness of teaching. Speech services such as Azure Pronunciation Assessment and Custom Neural Voice provide personalization, automation, and analytics in education platforms, which can lead to better student engagement and achievement. These services also enable educators to provide instant feedback on speech accuracy, fluency, and completeness which helps language learners to improve their pronunciation and fluency. With the ability to assess pronunciation in real-time, AI-powered speech services can help make the language assessment more engaging and accessible to learners of all backgrounds. Additionally, these services can also help with personalization of the learning experience for each student by providing personalized feedback and recommendations based on individual student needs. The integration of AI into the education sector can help educators empower students, and help students achieve their full potential.

Get started with Azure Cognitive Services 

Check out these features in Speech Studio using a no-code approach. Speech Studio is a set of UI-based tools for building AI services into your applications.

  • Explore

     

    Let us know what you think of Azure and what you would like to see in the future.

     

    Provide feedback

  • Build your cloud computing and Azure skills with free courses by Microsoft Learn.

     

    Explore Azure learning