Exploring open-source capabilities in Azure AI

This post was co-authored by Richard Tso, Director of Product Marketing, Azure AI

Open-source technologies have had a profound impact on the world of AI and machine learning, enabling developers, data scientists, and organizations to collaborate, innovate, and build better AI solutions. As large AI models like GPT-3.5 and DALL-E become more prevalent, organizations are also exploring ways to leverage existing open-source models and tools without needing to put a tremendous amount of effort into building them from scratch. Microsoft Azure AI is leading this effort by working closely with GitHub and data science communities, and providing organizations with access to a rich set of open-source technologies for building and deploying cutting-edge AI solutions.

At Azure Open Source Day, we highlighted Microsoft’s commitment to open source and how to build intelligent apps faster and with more flexibility using the latest open-source technologies that are available in Azure AI.

Build and operationalize open-source State-of-the-Art models in Azure Machine Learning

Recent advancements in AI propelled the rise of large foundation models that are trained on a vast quantity of data and can be easily adapted to a wide variety of applications across various industries. This emerging trend provides a unique opportunity for enterprises to build and use foundation models in their deep learning workloads.

Today, we’re announcing the upcoming public preview of foundation models in Azure Machine Learning. It provides Azure Machine Learning with native capabilities that enable customers to build and operationalize open-source foundation models at scale. With these new capabilities, organizations will get access to curated environments and Azure AI Infrastructure without having to manually manage and optimize dependencies. Azure Machine learning professionals can easily start their data science tasks to fine-tune and deploy foundation models from multiple open-source repositories, starting from Hugging Face, using Azure Machine Learning components and pipelines. This service will provide you with a comprehensive repository of popular open-source models for multiple tasks like natural language processing, vision, and multi-modality through the Azure Machine Learning built in registry. Users can not only use these pre-trained models for deployment and inferencing directly, but they will also have the ability to fine-tune supported machine learning tasks using their own data and import any other models directly from the open-source repository.

The next generation of Azure Cognitive Services for Vision

Today, Azure Cognitive Services for Vision released its next generation of capabilities powered by the Florence large foundational model. This new Microsoft model delivers significant improvements to image captioning and groundbreaking customization capabilities with few-shot learning. Until today, model customization required large datasets with hundreds of images per label to achieve production quality for vision tasks. But, Florence is trained on billions of text-image pairs, allowing custom models to achieve high quality with just a few images. This lowers the hurdle for creating models that can fit challenging use cases where training data is limited.

Users can try the new capabilities of Vision underpinned by the Florence model through Vision Studio. This tool demonstrates a full set of prebuilt vision tasks, including automatic captioning, smart cropping, classifying images and a summarizing video with natural language, and much more. Users can also see how the tool helps track movements, analyze environments, and provide real-time alerts.

The image is an example of Azure Cognitive Services Vision UI, using the Florence model for a video summarization task.

To learn more about the new Florence model in Azure Cognitive Services for Vision, please check out this announcement blog.

New Responsible AI Toolbox additions

Responsible AI is a critical consideration for organizations building and deploying AI solutions. Last year, Microsoft launched the Responsible AI Dashboard within the Responsible AI Toolkit, a suite of tools for a customized, responsible AI experience with unique and complementary functionalities available on GitHub and in Azure Machine Learning. We recently announced the addition of two new open-source tools designed to make the adoption of responsible AI practices more practical.

The Responsible AI Mitigations Library allows practitioners to experiment with different mitigation techniques more easily, while the Responsible AI Tracker uses visualizations to demonstrate the effectiveness of different mitigations for more informed decision-making. The new mitigations library bolsters mitigation by offering a means of managing failures that occur in data preprocessing. The library complements the toolbox’s Fairlearn fairness assessment tool, which focuses on mitigations applied during training time. The tracker allows practitioners to look at performance for subsets of data across iterations of a model to help them determine the most appropriate model for deployment. When used with other tools in the Responsible AI Toolbox, they offer a more efficient and effective means to help improve the performance of systems across users and conditions. These tools are made open source on GitHub and integrated into Azure Machine Learning.

The image shows an example UI of the Responsible AI Tracker, visualizing the model performance across multiple iterations with red and green color.

Accelerate large-scale AI with Azure AI infrastructure

Azure AI Infrastructure provides massive scale-up and scale-out capabilities for the most advanced AI workloads in the world. This is a key factor as to why leading AI companies, including our partners at OpenAI continue to choose Azure to advance their AI innovation on Azure AI. Our results for training OpenAI’s GPT-3 on Azure AI Infrastructure using Azure NDm A100 v4 virtual machines with NVIDIA’s open-source framework, NVIDIA NeMo Megatron, delivered a 530B-parameter benchmark on 175 virtual machines, resulting in a scalability factor of 95 percent. When Azure AI infrastructure is used together with a managed end-to-end machine learning platform, such as Azure Machine Learning, it provides the vast compute needed to enable organizations to streamline management and orchestration of large AI models and help bring them into production.

The full benchmarking report for GPT-3 models with the NVIDIA NeMo Megatron framework on Azure AI infrastructure is available here.

Optimized training framework to accelerate PyTorch model development

Azure is a preferred platform for widely used open-source framework-PyTorch. At Microsoft Ignite, we launched Azure Container for PyTorch (ACPT) within Azure Machine Learning, bringing together the latest PyTorch version with our best optimization software for training and inferencing, such as DeepSpeed and ONNX Runtime, all tested and optimized for Azure. All these components are already installed in ACPT and validated to reduce setup costs and accelerate training time for large deep learning workloads. ACPT curated environment allows our customers to efficiently train PyTorch models. The optimization libraries like ONNX Runtime and DeepSpeed composed within the container can increase production speed up from 54 percent to 163 percent over regular PyTorch workloads as seen on various Hugging Face models.

The chart shows ACPT that combines ONNX Runtime and DeepSpeed can increase production speed up to 54 percent to 163 percent over regular PyTorch workloads.

This month, we’re bringing a new capability to ACPT-Nebula. Nebula is a component in ACPT that can help data scientists to boost checkpoint savings time faster than existing solutions for distributed large-scale model training jobs with PyTorch. Nebula is fully compatible with different distributed PyTorch training strategies, including PyTorch Lightning, DeepSpeed, and more. In saving medium-sized Hugging Face GPT2-XL checkpoints (20.6 GB), Nebula achieved a 96.9 percent reduction in single checkpointing time. The speed gain of saving checkpoints can still increase with model size and GPU numbers. Our results show that, with Nebula, saving a checkpoint with a size of 97GB in a training job on 128 A100 Nvidia GPUs can be reduced from 20 minutes to 1 second. With the ability to reduce checkpoint times from hours to seconds-a potential reduction of 95 percent to 99.9 percent, Nebula provides a solution to frequent saving and reduction of end-to-end training time in large-scale training jobs.

The chart shows Nebula achieved a 96.9 percent reduction in single checkpointing time with GPT2-XL.

To learn more about Azure Container for PyTorch, please check out this announcement blog.

MLflow 2.0 and Azure Machine Learning

MLflow is an open-source platform for the complete machine learning lifecycle, from experimentation to deployment. Being one of the MLflow contributors, Azure Machine Learning made its workspaces MLflow-compatible, which means organizations can use Azure Machine Learning workspaces in the same way that they use an MLflow tracking server. MLflow has recently released its new version, MLflow 2.0, which incorporates a refresh of the core platform APIs based on extensive feedback from MLflow users and customers, which simplifies the platform experience for data science and machine learning operations workflows. We’re excited to announce that MLflow 2.0 is also supported in Azure Machine Learning workspaces.

Read this blog to learn more about what you can do with MLflow 2.0 in Azure Machine Learning.

Azure AI is empowering developers and organizations to build cutting-edge AI solutions with its rich set of open-source technologies. From leveraging pre-trained models to customizing AI capabilities with new technologies like Hugging Face foundation models, to integrating responsible AI practices with new open-source tools, Azure AI is driving innovation and efficiency in the AI industry. With Azure AI infrastructure, organizations can accelerate their large-scale AI workloads and achieve even greater results. Read this blog and the on-demand session to take a deep dive into what open-source projects and features we’ve announced at Azure Open Source Day 2023.

We’d like to conclude this blog post with some outstanding customer examples that demonstrate their success strategy of combining open-source technologies and building their own AI solutions to transform businesses.

What is most important about these announcements is the creative and transformative ways our customers are leveraging open-source technologies to build their own AI solutions.

These are just a few examples from our customers.

Customers innovating with open-source on Azure AI

Elekta is a company that provides technology, software, and services for cancer treatment providers and researchers. Elekta considers AI as essential to expanding the use and availability of radiotherapy treatments. AI technology helps accelerate the overall treatment planning process and monitors patient movement in real-time during treatment. Elekta uses Azure cloud infrastructure for the storage and compute resources needed for their AI-enabled solutions. Elekta relies heavily on Azure Machine Learning, Azure Virtual Machines, and the PyTorch open-source machine learning framework to create virtual machines and optimize their neural networks. Read full story.

The National Basketball Association (NBA) is using AI and open-source technologies to enhance its fan experience. The NBA and Microsoft have partnered to create a direct-to-consumer platform that offers more personalized and engaging content to fans. The NBA uses AI-driven data analysis system, NBA CourtOptix, which uses player tracking and spatial position information to derive insights into the games. The system is powered by Microsoft Azure, including Azure Data Lake Storage, Azure Machine Learning, MLflow, and Delta Lake, among others. The goal is to turn the vast amounts of data into actionable insights that fans can understand and engage with. The NBA also hopes to strengthen its direct relationship with fans and increase engagement through increased personalization of content delivery and marketing efforts. Read full story.

AXA, a leading car insurance company in the United Kingdom needed to streamline the management of its online quotes to keep up with the fast-paced digital marketplace. With 30 million car insurance quotes processed daily, the company sought to find a solution to speed up deployment of new pricing models. In 2020, the AXA data science team discovered managed endpoints in Azure Machine Learning and adopted the technology during private preview. The team tested ONNX open-source models deployed through managed endpoints and achieved a great reduction in response time. The company intends to use Azure Machine Learning to deliver value, relevance, and personalization to customers and establish a more efficient and agile process. Read full story.

Exploring open-source capabilities in Azure AI

Build and operationalize open-source State-of-the-Art models in Azure Machine Learning

The next generation of Azure Cognitive Services for Vision

New Responsible AI Toolbox additions

Accelerate large-scale AI with Azure AI infrastructure

Optimized training framework to accelerate PyTorch model development

MLflow 2.0 and Azure Machine Learning

Customers innovating with open-source on Azure AI

Takuto Higuchi

AT&T and Microsoft scale trillion-token workloads with Microsoft Foundry and AMD

Azure Databricks delivers proven business value

GPT-5.6 now available in Microsoft Foundry

Explore Microsoft Foundry

Exploring open-source capabilities in Azure AI

Build and operationalize open-source State-of-the-Art models in Azure Machine Learning

The next generation of Azure Cognitive Services for Vision

New Responsible AI Toolbox additions

Accelerate large-scale AI with Azure AI infrastructure

Optimized training framework to accelerate PyTorch model development

MLflow 2.0 and Azure Machine Learning

Customers innovating with open-source on Azure AI

Takuto Higuchi

Related posts

AT&T and Microsoft scale trillion-token workloads with Microsoft Foundry and AMD

Azure Databricks delivers proven business value

GPT-5.6 now available in Microsoft Foundry

Explore Microsoft Foundry