Skip to main content
Azure
  • 7 min read

Building the future together: Microsoft and NVIDIA announce AI advancements at GTC DC

Wordmarks for Nvidia and Microsoft.

Tech Community

Connect with a community to find answers, ask questions, build skills, and accelerate your learning.

Visit the Azure AI Foundry tech community
New offerings in Azure AI Foundry give businesses an enterprise-grade platform to build, deploy, and scale AI applications and agents.

Microsoft and NVIDIA are deepening our partnership to power the next wave of AI industrial innovation. For years, our companies have helped fuel the AI revolution, bringing the world’s most advanced supercomputing to the cloud, enabling breakthrough frontier models, and making AI more accessible to organizations everywhere. Today, we’re building on that foundation with new advancements that deliver greater performance, capability, and flexibility.

With added support for NVIDIA RTX PRO 6000 Blackwell Server Edition on Azure Local, customers can deploy AI and visual computing workloads distributed and edge environments with the seamless orchestration and management you use in the cloud. New NVIDIA Nemotron and NVIDIA Cosmos models in Azure AI Foundry give businesses an enterprise-grade platform to build, deploy, and scale AI applications and agents. With NVIDIA Run:ai on Azure, enterprises can get more from every GPU to streamline operations and accelerate AI. Finally, Microsoft is redefining AI infrastructure with the world’s first deployment of NVIDIA GB300 NVL72.

Today’s announcements mark the next chapter in our full-stack AI collaboration with NVIDIA, empowering customers to build the future faster.

Expanding GPU support to Azure Local

Microsoft and NVIDIA continue to drive advancements in artificial intelligence, offering innovative solutions that span the public and private cloud, the edge, and sovereign environments.

As highlighted in the March blog post for NVIDIA GTC, Microsoft will offer NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on Azure. Now, with expanded availability of NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs on Azure Local, organizations can optimize their AI workloads, regardless of location, to provide customers with greater flexibility and more options than ever. Azure Local leverages Azure Arc to empower organizations to run advanced AI workloads on-premises while retaining the management simplicity of the cloud or operating in fully disconnected environments. 

NVIDIA RTX PRO 6000 Blackwell GPUs provide the performance and flexibility needed to accelerate a broad range of use cases, from agentic AI, physical AI, and scientific computing to rendering, 3D graphics, digital twins, simulation, and visual computing. This expanded GPU support unlocks a range of edge use cases that fulfill the stringent requirements of critical infrastructure for our healthcare, retail, manufacturing, government, defense, and intelligence customers. This may include real-time video analytics for public safety, predictive maintenance in industrial settings, rapid medical diagnostics, and secure, low-latency inferencing for essential services such as energy production and critical infrastructure. The NVIDIA RTX PRO 6000 Blackwell enables improved virtual desktop support by leveraging NVIDIA vGPU technology and Multi-Instance GPU (MIG) capabilities. This can not only accommodate a higher user density, but also power AI-enhanced graphics and visual compute capabilities, offering an efficient solution for demanding virtual environments.

Earlier this year, Microsoft announced a multitude of AI capabilities at the edge, all enriched with NVIDIA accelerated computing:

  • Edge Retrieval Augmented Generation (RAG): Empower sovereign AI deployments with fast, secure, and scalable inferencing on local data—supporting mission-critical use cases across government, healthcare, and industrial automation.
  • Azure AI Video Indexer enabled by Azure Arc: Enables real-time and recorded video analytics in disconnected environments—ideal for public safety and critical infrastructure monitoring or post-event analysis.

With Azure Local, customers can meet strict regulatory, data residency, and privacy requirements while harnessing the latest AI innovations powered by NVIDIA.

Whether you need ultra-low latency for business continuity, robust local inferencing, or compliance with industry regulations, we’re dedicated to delivering cutting-edge AI performance wherever your data resides. Customers now access the breakthrough performance of the NVIDIA RTX PRO 6000 Blackwell GPUs in new Azure Local solutions—including Dell AX-770, HPE ProLiant DL380 Gen12, and Lenovo ThinkAgile MX650a V4.

To find out more about upcoming availability and sign up for early ordering, visit: 

Powering the future of AI with new models on Azure AI Foundry

At Microsoft, we’re committed to bringing the most advanced AI capabilities to our customers, wherever they need them. Through our partnership with NVIDIA, Azure AI Foundry now brings world-class multimodal reasoning models directly to enterprises, deployable anywhere as secure, scalable NVIDIA NIM™ microservices. The portfolio spans a range of different use cases:

NVIDIA Nemotron Family: High accuracy open models and datasets for agentic AI

  • Llama Nemotron Nano VL 8B is available now and is tailored for multimodal vision-language tasks, document intelligence and understanding, and mobile and edge AI agents. 
  • NVIDIA Nemotron Nano 9B is available now and supports enterprise agents, scientific reasoning, advanced math, and coding for software engineering and tool calling. 
  • NVIDIA Llama 3.3 Nemotron Super 49B 1.5 is coming soon and is designed for enterprise agents, scientific reasoning, advanced math, and coding for software engineering and tool calling.

NVIDIA Cosmos Family: Open world foundation models for physical AI

  • Cosmos Reason-1 7B is available now and supports robotics planning and decision making, training data curation and annotation for autonomous vehicles, and video analytics AI agents extracting insights and performing root-cause analysis from video data.
  • NVIDIA Cosmos Predict 2.5 is coming soon and is a generalist model for world state generation and prediction. 
  • NVIDIA Cosmos Transfer 2.5 is coming soon and is designed for structural conditioning and physical AI.

Microsoft TRELLIS by Microsoft Research: High-quality 3D asset generation 

  • Microsoft TRELLIS by Microsoft Research is available now and enables digital twins by generating accurate 3D assets from simple prompts, immersive retail experiences with photorealistic product models for AR and virtual try-ons, and game and simulation development by turning creative ideas into production-ready 3D content.

Together, these open models reflect the depth of the Azure and NVIDIA partnership: combining Microsoft’s adaptive cloud with NVIDIA’s leadership in accelerated computing to power the next generation of agentic AI for every industry. Learn more about the models here.

Maximizing GPU utilization for enterprise AI with NVIDIA Run:ai on Azure

As an AI workload and GPU orchestration platform, NVIDIA Run:ai helps organizations make the most of their compute investments, accelerating AI development cycles and driving faster time-to-market for new insights and capabilities. By bringing NVIDIA Run:ai to Azure, we’re giving enterprises the ability to dynamically allocate, share, and manage GPU resources across teams and workloads, helping them get more from every GPU.

NVIDIA Run:ai on Azure integrates seamlessly with core Azure services, including Azure NC and ND series instances, Azure Kubernetes Service (AKS), and Azure Identity Management, and offers compatibility with Azure Machine Learning and Azure AI Foundry for unified, enterprise-ready AI orchestration. We’re bringing hybrid scale to life to help customers transform static infrastructure into a flexible, shared resource for AI innovation.

With smarter orchestration and cloud-ready GPU pooling, teams can drive faster innovation, reduce costs, and unleash the power of AI across their organizations with confidence. NVIDIA Run:ai on Azure enhances AKS with GPU-aware scheduling, helping teams allocate, share, and prioritize GPU resources more efficiently. Operations are streamlined with one-click job submission, automated queueing, and built in governance. This ensures teams spend less time managing infrastructure and more time focused on building what’s next. 

This impact spans industries, supporting the infrastructure and orchestration behind transformative AI workloads at every stage of enterprise growth: 

  • Healthcare organizations can use NVIDIA Run:ai on Azure to advance medical imaging analysis and drug discovery workloads across hybrid environments. 
  • Financial services organizations can orchestrate and scale GPU clusters for complex risk simulations and fraud detection models. 
  • Manufacturers can accelerate computer vision training models for improved quality control and predictive maintenance in their factories. 
  • Retail companies can power real-time recommendation systems for more personalized experiences through efficient GPU allocation and scaling, ultimately better serving their customers.

Powered by Microsoft Azure and NVIDIA, Run:ai is purpose-built for scale, helping enterprises move from isolated AI experimentation to production-grade innovation.

Reimagining AI at scale: First to deploy NVIDIA GB300 NVL72 supercomputing cluster

Microsoft is redefining AI infrastructure with the new NDv6 GB300 VM series, delivering the first at-scale production cluster of NVIDIA GB300 NVL72 systems, featuring over 4600 NVIDIA Blackwell Ultra GPUs connected via NVIDIA Quantum-X800 InfiniBand networking. Each NVIDIA GB300 NVL72 rack integrates 72 NVIDIA Blackwell Ultra GPUs and 36 NVIDIA Grace™ CPUs, delivering over 130 TB/s of NVLink bandwidth and up to 136 kW of compute power in a single cabinet. Designed for the most demanding workloads—reasoning models, agentic systems, and multimodal AI—GB300 NVL72 combines ultra-dense compute, direct liquid cooling, and smart rack-scale management to deliver breakthrough efficiency and performance within a standard datacenter footprint. 

Azure’s co-engineered infrastructure enhances GB300 NVL72 with technologies like Azure Boost for accelerated I/O and integrated hardware security modules (HSM) for enterprise-grade protection. Each rack arrives pre-integrated and self-managed, enabling rapid, repeatable deployment across Azure’s global fleet. As the first cloud provider to deploy NVIDIA GB300 NVL72 at scale, Microsoft is setting a new standard for AI supercomputing—empowering organizations to train and deploy frontier models faster, more efficiently, and more securely than ever before. Together, Azure and NVIDIA are powering the future of AI. 

Learn more about Microsoft’s systems approach in delivering GB300 NVL72 on Azure.

Unleashing the performance of ND GB200-v6 VMs with NVIDIA Dynamo 

Our collaboration with NVIDIA focuses on optimizing every layer of the computing stack to help customers maximize the value of their existing AI infrastructure investments. 

To deliver high-performance inference for compute-intensive reasoning models at scale, we’re bringing together a solution that combines the open-source NVIDIA Dynamo framework, our ND GB200-v6 VMs with NVIDIA GB200 NVL72 and Azure Kubernetes Service(AKS). We’ve demonstrated the performance this combined solution delivers at scale with the gpt-oss 120b model processing 1.2 million tokens per second deployed in a production-ready, managed AKS cluster and have published a deployment guide for developers to get started today. 

Dynamo is an open-source, distributed inference framework designed for multi-node environments and rack-scale accelerated compute architectures. By enabling disaggregated serving, LLM-aware routing and KV caching, Dynamo significantly boosts performance for reasoning models on Blackwell, unlocking up to 15x more throughput compared to the prior Hopper generation, opening new revenue opportunities for AI service providers. 

These efforts enable AKS production customers to take full advantage of NVIDIA Dynamo’s  inference optimizations when deploying frontier reasoning models at scale. We’re dedicated to bringing the latest open-source software innovations to our customers, helping them fully realize the potential of the NVIDIA Blackwell platform on Azure. 

Learn more about Dynamo on AKS.

Get more AI resources

Tech Community

Connect with a community to find answers, ask questions, build skills, and accelerate your learning.

Visit the Azure AI Foundry tech community