Skip to main content
Azure
  • 2 min read

Announcing Custom Speech Service (Preview) from Microsoft Cognitive Services

We are excited to announce the public preview release of the Custom Speech Service from Microsoft Cognitive Services. The Custom Speech Service (formerly the Custom Recognition Intelligent Service) lets you customize Microsoft’s speech-to-text engine. By uploading text and/or speech data to the Custom Speech Service that reflects your application and your users, you can create custom models …

We are excited to announce the public preview release of the Custom Speech Service from Microsoft Cognitive Services. The Custom Speech Service (formerly the Custom Recognition Intelligent Service) lets you customize Microsoft’s speech-to-text engine. By uploading text and/or speech data to the Custom Speech Service that reflects your application and your users, you can create custom models that can be combined with Microsoft’s state-of-the-art speech models and deployed to a custom speech-to-text endpoint, accessible from any device.

Why customize the speech-to-text engine?

Speech recognition systems are composed of several components. Two of the most important components are the acoustic model and the language model. The acoustic and language models behind Microsoft’s world-class speech recognition engine have been optimized for common usage scenarios, such as interacting with Cortana on your smart phone, tablet or PC, searching the web by voice, or sending text messages to a friend.

If your application contains particular vocabulary items, such as product names or jargon that rarely occur in typical speech, it is likely that you can obtain improved performance by customizing the language model.

For example, if you were building an app to assist automotive mechanics, terms like “powertrain” or “catalytic converter” or “limited slip differential” will appear more frequently in this application than in typical voice applications. Customizing the language model will enable the system to learn this.  

Similarly, customizing the acoustic model can enable the system to do a better job recognizing speech in particular environments or from particular user populations. For example, if you have a voice-enabled app designed for use in a warehouse or factory, a custom acoustic model can more accurately recognize speech in the presence of the noises found in these environments.

How do I get started?

Visit www.cris.ai to learn how to create and deploy custom speech-to-text models. The site provides resources that enable you to use a a simple interface to import text and/or audio data, create custom acoustic and language models, and evaluate performance. The custom models can be deployed in conjunction with Microsoft’s existing state-of-the-art models to create custom speech-to-text endpoints.

We’ve made some sample text data for building and testing a custom language model available on Custom Speech Service GitHub page. The model will enable you to build an application that can transcribe facts about dinosaurs, because, you know, everybody loves dinosaurs.

We welcome your Feedback and Questions link = https://cognitive.uservoice.com/