A look at Azure's automated machine learning capabilities

Posted on 4 June, 2019

Principal Program Manager, Azure Machine Learning

The automated machine learning capability in Azure Machine Learning service allows data scientists, analysts, and developers to build machine learning models with high scalability, efficiency, and productivity all while sustaining model quality. Automated machine learning builds a set of machine learning models automatically, intelligently selecting models for training then recommending the best one for your scenario and data set. Traditional machine learning model development is resource-intensive requiring both significant domain knowledge and time to produce and compare dozens of models.

With the announcement of automated machine learning in Azure Machine Learning service as generally available last December, we have started the journey to simplify artificial intelligence (AI). This helps data scientists who want to automate part of their machine learning workflow so they can spend more time focusing on other business objectives. It also makes AI available for a wider audience of business users who don’t have advanced data science and coding knowledge.

We are furthering our investment for accelerating productivity with this release that includes exciting capabilities and features in the areas of model quality, improved model transparency, the latest integrations, ONNX support, a code-free user interface, time series forecasting, and product integrations.

1. Automated machine learning no-code web interface (preview)

Continuing our mission to simplify machine learning, Azure introduced the automated machine learning web user interface in Azure portal. The web user interface enables business domain experts to train models on their data, without writing a single line of code. Users can simply bring their data and, with a few clicks, start training on it. After automated machine learning comes up with the best possible model, customized to the user’s data, they can deploy the model to Azure machine learning service as a web service to generate future predictions on new data.

To start exploring the automated machine learning UI, simply go to Azure portal and navigate to an Azure machine learning workspace, where you will see “Automated machine learning” under the “Authoring” section. If you don’t have an Azure machine learning workspace yet, you can always learn how to create a workspace. To learn more, refer to the automated machine learning UI blog.

Gif image for creating a new automated machine learning experiement

2. Time series forecasting

Building forecasts is an integral part of any business, whether it’s revenue, inventory, sales, or customer demand. Forecasting with automated machine learning is now generally available. These capabilities improve the accuracy and performance of recommended models with time series data including a predict forecast function, rolling cross validation splits for time series data, configurable lags, window aggregation, and a holiday featurizer. This ensures high accuracy forecasting models and supporting automation for machine learning across many scenarios.

To learn more, refer to the how to guide with time series data and samples on GitHub.

3. Model transparency

We understand transparency is very important for you to trust the models recommended by automated machine learning.

  • Now you can understand all steps in the machine learning pipeline including automated featurization (if you set preprocess=True). Learn more about all the preprocessing and featurization steps that automated machine learning performs. You can also programmatically understand how your input data got preprocess and featurized, what kind of scaling and normalization was done and the exact machine learning algorithm and hyperparameter values for a chosen machine learning pipeline. Follow these steps to learn more.
  • Model interpretability (feature importance) was enabled as a preview capability back in December. Since then, we have made improvements including significant performance boost.

4. ONNX Models (preview)

In many enterprises, data scientists build models in Python since the popular machine learning frameworks are in Python. Many Azure Machine Learning service users also create models using Python. However, in many deployment environments, line of business applications are written in C# or Java, requiring users to “recode” the model. This adds a lot of friction as many times models never get deployed into production. With ONNX support, users can build ONNX models using automated machine learning and integrate with C# applications, without recoding.

To find out more information, please visit GitHub notebook.

5. Enabling .NET developers using Visual Studio/VS Code (preview)

Empower your applications with automated machine learning while remaining in the comfort of the .NET ecosystem. The .NET automated machine learning API enables developers to leverage automated machine learning capabilities without needing to learn Python. Seamlessly integrate automated machine learning within your existing .NET project by using the API's NuGet package. Tackle your binary classification, multiclass classification, and regression tasks within Visual Studio and Visual Studio Code.

6. Empowering data analysts in PowerBI (preview)

We have enabled data analysts and BI professionals using PowerBI to build, deploy, and inference machine learning models, all within PowerBI. This integration allows PowerBI customers to use their data in PowerBI dataflows and leverage the power of automated machine learning capability of Azure Learning service to build models with a no-code experience and then deploy and use the models from PowerBI. Imagine the kind of machine learning powered PowerBI applications and reports you can create with this capability.

7. Automated machine learning in SQL Server

If you are looking to build models using your data in SQL server using your favorite SQL Server Management Studio interface, you can now leverage automated machine learning in Azure Machine Learning service to build, deploy, and use models. This is made possible by simply wrapping python-based machine learning training and inferencing scripts in SQL stored procedures. This is well suited for use with data residing in SQL Server tables and provides an ideal solution for any version of SQL Server that supports SQL Server Machine Learning Services.

8. Automated machine learning in Spark

HDInsight has been integrated with automated machine learning. With this integration, customers who use automated machine learning can now effortlessly process massive amounts of data and get all the benefits of a broad, open source ecosystem with the global scale of Azure to run automated machine learning experiments. HDInsight allows customers to provision clusters with hundreds of nodes. Automated machine learning running on Apache Spark in the HDInsight cluster, allows users to use compute capacity across these nodes to be able to run training jobs at scale, as well as running multiple training jobs in parallel. This allows users to run automated machine learning experiments while sharing the compute with their other big data workloads. To find out more information, please visit GitHub notebooks and documentation.

We support automated machine learning on Azure Databricks clusters with a simple installation of the SDK in the cluster. You can get started by visiting the “Azure Databricks” section in our documentation, “Configure a development environment for Azure Machine Learning.”

Improved accuracy and performance

Since we announced general availability back in December, we have added several new capabilities to generate high quality models in a shorter amount of time.

  • An intelligent stopping capability that automatically figures out when to stop an experiment based on progress made on the primary metric. If no significant improvement is seen in the primary metric, an experiment is automatically stopped saving you time and compute.

  • With the goal of exploring a greater number of model pipelines in a given amount of time, users can leverage a sub-sampling strategy to train much faster, while minimizing loss.

  • Specify preprocess=True, to intelligently search across different featurization strategies to find the best one for the specified data with the goal of getting to a better model. Learn more about the various preprocessing/featurization steps.

  • XGBoost is available to the set of learners automated machine learning explores, as we see XGBoost models performing well.

  • Improved support for larger datasets, currently supporting datasets up to 10GB in size.

Learn more

Automated machine learning makes machine learning more accessible for data scientists of all levels of experience. Get started by visiting our documentation and let us know what you think. We are committed to making automated machine learning better for you!

Learn more about the Azure Machine Learning service.

Get started with a free trial of the Azure Machine Learning service.