Machine Learning model training with AKS

Training of models using large datasets is a complex and resource intensive task. Use familiar tools such as TensorFlow and Kubeflow to simplify training of Machine Learning models. Your ML models will run in AKS clusters backed by GPU enabled VMs.

Machine Learning model training with AKSMachine Learning model training with AKS123456

Package ML model into a container and publish to ACR

Azure Blob storage hosts training data sets and trained model

Use Kubeflow to deploy training job to AKS. Distributed training job to AKS includes Parameter servers and Worker nodes

Serve production model using Kubeflow, promoting a consistent environment across test, control and production

AKS supports GPU-enabled VM

Developer can build features querying the model running in AKS cluster

  1. 1 Package ML model into a container and publish to ACR
  2. 2 Azure Blob storage hosts training data sets and trained model
  3. 3 Use Kubeflow to deploy training job to AKS. Distributed training job to AKS includes Parameter servers and Worker nodes
  1. 4 Serve production model using Kubeflow, promoting a consistent environment across test, control and production
  2. 5 AKS supports GPU-enabled VM
  3. 6 Developer can build features querying the model running in AKS cluster