What is the real problem?
When creating a machine learning model, data scientists across all industry segments face the following challenges: Defining and tuning the hyperparameters, and deciding which algorithm to use.
If a customer plans to create a ML model that will predict the price of a car, the data scientist will need to pick up the right algorithm and hyperparameters. Narrowing down to the best algorithm and hyperparameters is a time-consuming process. This has been a challenge for Microsoft’s customers across all verticals, and Microsoft recently launched an Azure Machine Learning python SDK that has automated machine learning module. The automated ML module helps with not only defining and tuning hyperparameters but also picking the right algorithm!
What is automated ML?
Automated ML helps create high quality model using intelligent automation and optimization. Automated ML will figure out the right algorithm and hyper parameters to use. It is a tool that will improve the efficiency of data scientist!
Automated ML’s current capabilities
Automated ML currently supports the problem spaces regression and classification. Additional problem spaces such as clustering will be supported in future releases. From a data pre-processing perspective, automated ML allows one hot encoding (converting categorical variable to binary vector) and assign values to missing fields. It currently supports Python language and scikit-learn framework. For training the model, one could use laptop/desktop, Azure Batch AI or Azure DSVM. All scikit-learn supported data formats are currently supported.
High level steps to execute automated ML methods
Data scientists could run automated ML from different environments. Few options are either using Azure Notebooks or using local conda environment. You can find more details on our documentation.
The two key methods involved by automated ML are automated ML Config and submit.
automated ML Config (params) –> Specify # of Iterations, Metric to optimize, etc.
Example
Automl_config = AutoMLConfig(task = 'classification', primary_metric = 'AUC_weighted', max_time_sec = 12000, iterations = 20, n_cross_validations = 3, preprocess = False, exit_score = 0.995, blacklist_algos = ['kNN','LinearSVM'], X = X_digits, y = y_digits, path=project_folder)
experiment.submit method which has automated ML configuration object as a parameter
Example
from azureml.core.experiment import Experiment experiment=Experiment(ws, experiment_name) local_run = experiment.submit(Automl_config, show_output=True)
The final step is to operationalize the most performant model.
Availability
The SDK will be publicly available for use after the Ignite conference, which ends 28 September 2018. It will be available in westcentralus, eastus2 and west Europe to name a few Azure regions.
Conclusion
Automated ML is a leap towards the future of Data Science. It is bound to not only make data scientists working for any organization efficient because automated ML will automatically run multiple iterations of your experiment but also enable new or experienced data scientists to explore different algorithms and select and tune hyperparameters because automated ML will help do this. It is worth noting that the data scientists can start on a local machine leveraging the Azure ML Python SDK which has automated ML. Data scientist can then use the power of cloud to run the training/iterations using technologies such as Azure Batch AI or Azure DSVM.
Further reading
Some of the modules with the Azure ML Python SDK are already in public preview and you can find more details by reading our documentation. You can get started with automated ML using our documentation. We have also published some examples on github.