The trend toward the use of massive AI models to power a large number of tasks is changing how AI is built. At Microsoft Build 2020, we shared our vision for AI at Scale utilizing state-of-the-art AI supercomputing in Azure and a new class of large-scale AI models enabling next-generation AI. The advantage of large scale models is that they only need to be trained once with massive amounts of data using AI supercomputing, enabling them to then be “fine-tuned” for different tasks and domains with much smaller datasets and resources. The more parameters that a model has, the better it can capture the difficult nuances of the data, as demonstrated by our 17-billion-parameter Turing Natural Language Generation (T-NLG) model and its ability to understand language to answer questions from or summarize documents seen for the first time. Natural language models like this, significantly larger than the state-of-the-art models a year ago, and many orders of magnitude the size of earlier image-centric models, are now powering a variety of tasks throughout Bing, Word, Outlook, and Dynamics.
Training models at this scale requires large clusters of hundreds of machines with specialized AI accelerators interconnected by high-bandwidth networks inside and across the machines. We have been building such clusters in Azure to enable new natural language generation and understanding capabilities across Microsoft products, and to power OpenAI on their mission to build safe artificial general intelligence. Our latest clusters provide so much aggregated compute power that they are referred to as AI supercomputers, with the one built for OpenAI reaching the top-five publicly disclosed supercomputers in the world. Using this supercomputer, OpenAI unveiled in May their 175-billion-parameter GPT-3 model and its ability to support a wide range of tasks it wasn’t specifically trained for, including writing poetry or translation.
The work that we have done on large-scale compute clusters, leading network design, and the software stack, including Azure Machine Learning, ONNX Runtime, and other Azure AI services, to manage it is directly aligned with our AI at Scale strategy. The innovation generated through this process is ultimately making Azure better at supporting the AI needs of all our customers, irrespective of their scale. For example, with the NDv2 VM series, Azure was the first and only public cloud offering clusters of VMs with NVIDIA’s V100 Tensor Core GPUs, connected by high-bandwidth low-latency NVIDIA Mellanox InfiniBand networking. A good analogy is how automotive technology is pioneered in the high-end racing industry and then makes its way into the cars that we drive every day.
New frontiers with unprecedented scale
“Advancing AI toward general intelligence requires, in part, powerful systems that can train increasingly more capable models. The computing capability required was just not possible until recently. Azure AI and its supercomputing capabilities provide us with leading systems that help accelerate our progress” – Sam Altman, OpenAI CEO
In our continuum of Azure innovation, we’re excited to announce the new ND A100 v4 VM series, our most powerful and massively scalable AI VM, available on-demand from eight, to thousands of interconnected NVIDIA GPUs across hundreds of VMs.
The ND A100 v4 VM series starts with a single virtual machine (VM) and eight NVIDIA Ampere A100 Tensor Core GPUs, but just like the human brain is composed of interconnected neurons, our ND A100 v4-based clusters can scale up to thousands of GPUs with an unprecedented 1.6 Tb/s of interconnect bandwidth per VM. Each GPU is provided with its own dedicated topology-agnostic 200 Gb/s NVIDIA Mellanox HDR InfiniBand connection. Tens, hundreds, or thousands of GPUs can then work together as part of a Mellanox InfiniBand HDR cluster to achieve any level of AI ambition. Any AI goal (training a model from scratch, continuing its training with your own data, or fine-tuning it for your desired tasks) will be achieved much faster with dedicated GPU-to-GPU bandwidth 16x higher than any other public cloud offering.
The ND A100 v4 VM series is backed by an all-new Azure-engineered AMD Rome-powered platform with the latest hardware standards like PCIe Gen4 built into all major system components. PCIe Gen 4 and NVIDIA’s third-generation NVLINK architecture for the fastest GPU-to-GPU interconnection within each VM keeps data moving through the system more than 2x faster than before.
Most customers will see an immediate boost of 2x to 3x compute performance over the previous generation of systems based on NVIDIA V100 GPUs with no engineering work. Customers leveraging new A100 features like multi-precision Tensor Cores with sparsity acceleration and Multi-Instance GPU (MIG) can achieve a boost of up to 20x.
“Leveraging NVIDIA’s most advanced compute and networking capabilities, Azure has architected an incredible platform for AI at scale in the cloud. Through an elastic architecture that can scale from a single partition of an NVIDIA A100 GPU to thousands of A100 GPUs with NVIDIA Mellanox Infiniband interconnects, Azure customers will be able to run the world’s most demanding AI workloads.” – Ian Buck, General Manager and Vice President of Accelerated Computing at NVIDIA
The ND A100 v4 VM series leverages Azure core scalability blocks like VM Scale Sets to transparently configure clusters of any size automatically and dynamically. This will allow anyone, anywhere, to achieve AI at any scale, instantiating even AI supercomputer on-demand in minutes. You can then access VMs independently or launch and manage training jobs across the cluster using the Azure Machine Learning service.
The ND A100 v4 VM series and clusters are now in preview and will become a standard offering in the Azure portfolio, allowing anyone to unlock the potential of AI at Scale in the cloud. Please reach out to your local Microsoft account team for more information.