Skin Cancer Image Classification with Azure Machine Learning (AML) package for Computer Vision (AMLPCV) and Team Data Science Process (TDSP)

NOTE This content is no longer maintained. Visit the Azure Machine Learning Notebook project for sample Jupyter notebooks for ML and deep learning with Azure Machine Learning.

Introduction

This article shows how to use the Azure Machine Learning Package for Computer Vision (AMLPCV) to train, test, and deploy an image classification model. The sample uses the TDSP structure and templates in Azure Machine Learning Workbench. The complete sample is provided in this walkthrough. It uses CNTK as the deep learning framework, and training is performed on a Data Science VM GPU machine. Deployment uses the Azure ML Operationalization CLI.

Many applications in the computer vision domain can be framed as image classification problems. These include building models that answer questions such as "Is an object present in the image?" (object could be dog, car, or ship) and more complex questions such as "What class of eye disease severity is evinced by this patient's retinal scan?" AMLPCV streamlines image classification data processing and modeling pipeline.

Team Data Science Process (TDSP) Walkthrough with AMLPCV

This walkthrough uses the Team Data Science Process (TDSP) lifecycle.

The walkthrough covers the following lifecycle steps:

1. Data acquision

ISIC dataset is used for the image classification task. ISIC (The International Skin Imaging Collaboration) is a partership between academia and industry to facilitate the application of digital skin imaging to study and help reduce melanoma mortality. The ISIC archive contains over 13,000 skin lesion images with labels either benign or malignant. We download a sample of the images from ISIC archive.

2. Modeling

In modeling step, the following substeps are performed.

2.1 Dataset Creation

In order to generate a Dataset object in AMLPCV, provide a root directory of images on the local disk.

2.2 Image Visualization and annotation

Visualize the images in the dataset object, and correct some of the labels if necessary.

2.3 Image Augmentation

Augment a dataset object using the transformations described in the imgaug library.

2.4 DNN Model Definition

Define the model architecture used in the training step. Six different per-trained deep neural network models are supported in AMLPCV: AlexNet, Resnet-18, Resnet-34, and Resnet-50, Resnet-101, and Resnet-152.

2.5 Classifier Training

Train the neural networks with default or custom parameters.

2.6 Evaluation and Visualization

The substep provides functionality to evaluate the performance of the trained model on an independent test dataset. The evaluation metrics include accuracy, precision and recall, and ROC curve.

Those substeps are explained in detail in the corresponding Jupyter Notebook. We also provided guidelines for turning the parameters such as learning rate, mini batch size, and dropout rate to further improve the model performance.

3. Deployment

This step operationalizes the model produced from the modeling step. It introduces the operationalization prerequisites and setup. Finally, the consumption of the web service is also explained. Through this tutorial, you can learn to build deep learning models with AMLPCV and operationalize the model in Azure.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (for example, label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information, see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Data/Telemetry

The image classification TDSP project collects usage data and sends it to Microsoft.

References