Skip Navigation

Speech services July 2018 update

Posted on July 18, 2018

Senior Program Manager, Speech Services

A lot has happened since we announced that Speech services is now in preview, we have released the Cognitive Services Speech SDK June 2018 update.

Today, we are excited to announce that we have just released the 0.5.0 version of the Speech SDK. With this update, we have added support for UWP (on Windows version 1709), .NET Standard 2.0 (on Windows), and Java on Android 6.0 (Marshmallow, API level 23) or higher. We have made some feature changes and done some bug fixes. Most notably, we now support long-running audio and automatic reconnection. This will make the Speech service more resilient overall, in the event of timeout, network failures or service errors. We’ve also improved the error messages to make it easier to handle the errors. Please visit the Release Notes page for details. We will continue to add support for more platforms and programming languages, as we work toward making the Speech SDK generally available this fall.

image

Besides the Speech SDK, Custom Voice has also released a new feature to support more training data formats. All ‘.wav’ files (RIFF) with a sampling rates equal to or higher than 16khz are now accepted. Furthermore, we have extended support to more plain text encoding types (ANSI/UTF-8/UTF-8-BOM/UTF-16-LE/UTF-16-BE). For more details, visit our docs about how to prepare data and customize voice fonts. A new document is released to help you create high quality audio samples of human speech, with a focus on issues that you are likely to encounter during your voice training data preparation. For more details, see how to record voice samples for a custom voice.

CustomVoice

In addition, we are very happy to announce new content for our Speech (Preview) documentation.

The content update aims to help developers to quickly navigate to the right content, based on the type of application they are developing.

We have a new separate section on the end-to-end customization process, including acoustic adaptation, language adaptation, pronunciation and voice fonts. We’ve added documentation about the Batch Transcription API which is ideal for customers that have large quantities of audio files on storage.

The Documentation is also complementing this SDK update with the following sections.

  • Brand new Scenario section to help you navigate the documentation according to your applications needs.
  • Consolidated e2e Customization section (including data and tutorial on GitHub)
  • Brand new Batch Transcription API including GitHub Sample
  • More detail and elaborate FAQ section for each of the sub-services, under the Resources.

The documentation is live now. Please use the Feedback section at the bottom of the documentation pages to tell us what you think.”

Interested in the Microsoft Speech services? You can try it out for free. To learn more and review sample code, please reference our documentation page. Please follow us on Twitter @msspeech3 to be notified for the future updates.