{"id":32847,"date":"2024-03-27T12:00:00","date_gmt":"2024-03-27T19:00:00","guid":{"rendered":"https:\/\/azure.microsoft.com\/en-us\/blog\/?p=32847"},"modified":"2024-03-27T14:21:36","modified_gmt":"2024-03-27T21:21:36","slug":"microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference","status":"publish","type":"post","link":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/","title":{"rendered":"Microsoft Azure delivers game-changing performance for generative AI Inference"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Microsoft Azure has delivered industry-leading results for AI inference workloads among cloud service providers in the most recent&nbsp;<a href=\"https:\/\/mlcommons.org\/2024\/03\/mlperf-inference-v4\/\" target=\"_blank\" rel=\"noreferrer noopener\">MLPerf Inference results<\/a>&nbsp;published publicly by MLCommons. The Azure results were achieved using the new <a href=\"https:\/\/techcommunity.microsoft.com\/t5\/azure-high-performance-computing\/new-azure-nc-h100-v5-vms-optimized-for-generative-ai-and-hpc\/ba-p\/4087034\" target=\"_blank\" rel=\"noreferrer noopener\">NC H100 v5 series<\/a> virtual machines (VMs) powered by NVIDIA H100 NVL Tensor Core GPUs and reinforced the commitment from Azure to designing AI infrastructure that is optimized for training and inferencing in the cloud.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-evolution-of-generative-ai-models\">The evolution of generative AI models<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Models for generative AI are rapidly expanding in size and complexity, reflecting a prevailing trend in the industry toward ever-larger architectures. Industry-standard benchmarks and cloud-native workloads consistently push the boundaries, with models now reaching billions and even trillions of parameters. A prime example of this trend is the recent unveiling of Llama2, which boasts a staggering 70 billion parameters, marking it as MLPerf\u2019s most significant test of generative AI to date (figure 1). This monumental leap in model size is evident when comparing it to previous industry standards such as the Large Language Model GPT-J, which pales in comparison with 10x fewer parameters. Such exponential growth underscores the evolving demands and ambitions within the AI industry, as customers strive to tackle increasingly complex tasks and generate more sophisticated outputs.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Tailored specifically to address the dense or generative inferencing needs that models like Llama 2 require, the Azure NC H100 v5 VMs marks a significant leap forward in performance for generative AI applications. Its purpose-driven design ensures optimized performance, making it an ideal choice for organizations seeking to harness the power of AI with reliability and efficiency. With the NC H100 v5-series, customers can expect enhanced capabilities with these new standards for their AI infrastructure, empowering them to tackle complex tasks with ease and efficiency.&nbsp;<\/p>\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-2024-03-26-113640.webp\" alt=\"Graph highlighting that the size of the models in the MLPerf Benchmarking suite is increasing, up to 70 billion parameters.\" class=\"wp-image-32876 webp-format\" style=\"width:580px;height:auto\" data-orig-src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Screenshot-2024-03-26-113640.webp\"><figcaption class=\"wp-element-caption\">Figure 1: Evolution of the size of the models in the MLPerf Inference benchmarking suite.&nbsp;<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">However, the transition to larger model sizes necessitates a shift toward a different class of hardware that is capable of accommodating the large models on fewer GPUs. This paradigm shift presents a unique opportunity for high-end systems, highlighting the capabilities of advanced solutions like the NC H100 v5 series. As the industry continues to embrace the era of mega-models, the NC H100 v5 series stands ready to meet the challenges of tomorrow\u2019s AI workloads, offering unparalleled performance and scalability in the face of ever-expanding model sizes.<\/p>\n\n\n<div class=\"wp-block-msxcm-cta-block\" data-moray data-bi-an=\"CTA Block\">\n\t<div class=\"card d-block mx-ng mx-md-0\">\n\t\t<div class=\"row no-gutters\">\n\n\t\t\t\t\t\t\t<div class=\"col-md-4\">\n\t\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/12\/MSC24-India-business-Adobe-550638570-rgb-1024x683.jpg\" class=\"card-img img-object-cover\" alt=\"a person sitting at a table using a laptop\" srcset=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/12\/MSC24-India-business-Adobe-550638570-rgb-1024x683.jpg 1024w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/12\/MSC24-India-business-Adobe-550638570-rgb-300x200.jpg 300w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/12\/MSC24-India-business-Adobe-550638570-rgb-768x512.jpg 768w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/12\/MSC24-India-business-Adobe-550638570-rgb-1536x1024.jpg 1536w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/12\/MSC24-India-business-Adobe-550638570-rgb-2048x1365.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/>\t\t\t\t<\/div>\n\t\t\t\n\t\t\t<div class=\"d-flex col-md\">\n\t\t\t\t<div class=\"card-body align-self-center p-4 p-md-5\">\n\t\t\t\t\t\n\t\t\t\t\t<h2>Azure AI infrastucture<\/h2>\n\n\t\t\t\t\t<div class=\"mb-3\">\n\t\t\t\t\t\t<p>World-class infrastructure performance for AI workloads<\/p>\n\t\t\t\t\t<\/div>\n\n\t\t\t\t\t\t\t\t\t\t\t<div class=\"link-group\">\n\t\t\t\t\t\t\t<a href=\"https:\/\/azure.microsoft.com\/en-us\/solutions\/high-performance-computing\/ai-infrastructure\" class=\"btn btn-link text-decoration-none p-0\" target=\"_blank\">\n\t\t\t\t\t\t\t\t<span>Learn more<\/span>\n\t\t\t\t\t\t\t\t<span class=\"glyph-append glyph-append-chevron-right glyph-append-xsmall\"><\/span>\n\t\t\t\t\t\t\t<\/a>\n\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t<\/div>\n\n\t\t\t\t\t<\/div>\n\t<\/div>\n<\/div>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"enhanced-performance-with-purpose-built-ai-infrastructure\">Enhanced performance with purpose-built AI infrastructure<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The&nbsp;<a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/virtual-machines\/ncads-h100-v5\" target=\"_blank\" rel=\"noreferrer noopener\">NC H100 v5-series<\/a>&nbsp;shines with purpose-built infrastructure, featuring a superior hardware configuration that yields remarkable performance gains compared to its predecessors. Each GPU within this series is equipped with 94GB of HBM3 memory. This substantial increase in memory capacity and bandwidth translates in a 17.5% boost in memory size and a 64% boost in memory bandwidth over the previous generations. . Powered by NVIDIA H100 NVL PCIe GPUs and 4th-generation AMD EPYC\u2122 Genoa processors, these virtual machines feature up to 2 GPUs, alongside up to 96 non-multithreaded AMD EPYC Genoa processor cores and 640 GiB of system memory.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In <a href=\"https:\/\/mlcommons.org\/2024\/03\/mlperf-inference-v4\/\">today\u2019s announcement<\/a> from MLCommons, the NC H100 v5 series premiered performance results in the MLPerf Inference v4.0 benchmark suite. Noteworthy among these achievements is a 46% performance gain over competing products equipped with GPUs of 80GB of memory (figure 2), solely based on the impressive 17.5% increase in memory size (94 GB) of the NC H100 v5-series. This leap in performance is attributed to the series&#8217; ability to fit the large models into fewer GPUs efficiently. For smaller models like GPT-J with 6 billion parameters, there is a notable 1.6x speedup from the previous generation (NC A100 v4) to the new NC H100 v5. This enhancement is particularly advantageous for customers with dense Inferencing jobs, as it enables them to run multiple tasks in parallel with greater speed and efficiency while utilizing fewer resources.<\/p>\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Michelle-picture-2.webp\" alt=\"chart, bar chart, waterfall chart\" class=\"wp-image-32914 webp-format\" style=\"width:726px;height:auto\" data-orig-src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Michelle-picture-2.webp\"><figcaption class=\"wp-element-caption\">Figure 2: Azure results on the model Llama2 (70 billion parameters) from MLPerf Inference v4.0 in March 2024 (4.0-0004) and (4.0-0068).&nbsp;<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"performance-delivering-a-competitive-edge\">Performance delivering a competitive edge<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The increase in performance is important not just compared to previous generations of comparable infrastructure solutions In the MLPerf benchmarks results, Azure\u2019s NC H100 v5 series virtual machines results are standout compared to other cloud computing submissions made. Notably, when compared to cloud offerings with smaller memory capacities per accelerator, such as those with 16GB memory per accelerator, the NC H100 v5 series VMs exhibit a substantial performance boost. With nearly six times the memory per accelerator, Azure\u2019s purpose-built AI infrastructure series demonstrates a performance speedup of 8.6x to 11.6x (figure 3). This represents a performance increase of 50% to 100% for every byte of GPU memory, showcasing the unparalleled capacity of the NC H100 v5 series. These results underscore the series\u2019 capacity to lead the performance standards in cloud computing, offering organizations a robust solution to address their evolving computational requirements.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"580\" height=\"338\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/image-7.png\" alt=\"Figure 3: The throughput of the Azure NC H100 v5 virtual machine is up to 11.6 times higher that its equivalents with 16GB of memory per GPU.\" class=\"wp-image-32853\" srcset=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/image-7.png 580w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/image-7-300x175.png 300w\" sizes=\"auto, (max-width: 580px) 100vw, 580px\" \/><figcaption class=\"wp-element-caption\">Figure 3: Performance results on the model GPT-J (6 billion parameters) from MLPerf Inference v4.0 in March 2024 on Azure NC H100 v5 (4.0-0004) and an offering with 16GB of memory per accelerator (4.0-0045) \u2013 with one accelerator each.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In conclusion, the launch of the NC H100 v5 series marks a significant milestone in Azure\u2019s relentless pursuit of innovation in cloud computing. With its outstanding performance, advanced hardware capabilities, and seamless integration with Azure\u2019s ecosystem, the NC H100 v5 series is revolutionizing the landscape of AI infrastructure, enabling organizations to fully leverage the potential of generative AI Inference workloads. The latest MLPerf Inference v4.0 results underscore the NC H100 v5 series\u2019 unparalleled capacity to excel in the most demanding AI workloads, setting a new standard for performance in the industry. With its exceptional performance metrics and enhanced efficiency, the NC H100 v5 series reaffirms its position as a frontrunner in the realm of AI infrastructure, empowering organizations to unlock new possibilities and achieve greater success in their AI initiatives. Furthermore, Microsoft\u2019s commitment, as announced during the NVIDIA GPU Technology Conference (GTC), to continue innovating by introducing even more powerful GPUs to the cloud, such as the NVIDIA &nbsp;Grace Blackwell GB200 Tensor Core GPUs, further enhances the prospects for advancing AI capabilities and driving transformative change in the cloud computing landscape.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"learn-more-about-azure-generative-ai\">Learn more about Azure generative AI<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/techcommunity.microsoft.com\/t5\/azure-high-performance-computing\/new-azure-nc-h100-v5-vms-optimized-for-generative-ai-and-hpc\/ba-p\/4087034\" target=\"_blank\" rel=\"noreferrer noopener\">New Azure NC H100 v5 VMs Optimized for generative AI and HPC workloads is now generally available\u2014Microsoft Community Hub<\/a>\u00a0<\/li>\n\n\n\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/virtual-machines\/ncads-h100-v5\" target=\"_blank\" rel=\"noreferrer noopener\">NCads H100 v5-series\u2014Azure Virtual Machines | Microsoft Learn<\/a>&nbsp;<\/li>\n\n\n\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/mlcommons.org\/benchmarks\/inference-datacenter\/\" target=\"_blank\" rel=\"noreferrer noopener\">Benchmark MLPerf Inference: Datacenter | MLCommons V4.0<\/a><\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Microsoft Azure has delivered industry-leading results for AI inference workloads amongst cloud service providers in the most recent MLPerf Inference results published publicly by MLcommons. The Azure results were achieved using the new NC H100 v5 Virtual Machines (VMs) and reinforced the commitment from Azure to designing AI infrastructure that is optimized for training and inferencing in the cloud.<\/p>\n","protected":false},"author":45,"featured_media":32869,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ms_queue_id":[],"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","footnotes":"","msx_community_cta_settings":[]},"categories":[1454],"tags":[],"audience":[3057,3055,3056],"content-type":[1481],"product":[1803,1455],"tech-community":[],"topic":[],"coauthors":[2731,71],"class_list":["post-32847","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-machine-learning","audience-data-professionals","audience-developers","audience-it-implementors","content-type-thought-leadership","product-azure-ai","product-virtual-machines","review-flag-1-1680286581-825","review-flag-2-1680286581-601","review-flag-3-1680286581-173","review-flag-4-1680286581-250","review-flag-5-1680286581-950","review-flag-6-1680286581-909","review-flag-8-1680286581-263","review-flag-lever-1680286579-649","review-flag-microsofts","review-flag-new-1680286579-546"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Microsoft Azure delivers game-changing performance for generative AI Inference | Microsoft Azure Blog<\/title>\n<meta name=\"description\" content=\"Learn more about how Microsoft delivers performance for Generative AI inference by introducing even more powerful GPUs to the cloud.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Microsoft Azure delivers game-changing performance for generative AI Inference | Microsoft Azure Blog\" \/>\n<meta property=\"og:description\" content=\"Learn more about how Microsoft delivers performance for Generative AI inference by introducing even more powerful GPUs to the cloud.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Azure Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/microsoftazure\" \/>\n<meta property=\"article:published_time\" content=\"2024-03-27T19:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2024-03-27T21:21:36+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1260\" \/>\n\t<meta property=\"og:image:height\" content=\"708\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Hugo Affaticati, Eric Lockard\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg\" \/>\n<meta name=\"twitter:creator\" content=\"@azure\" \/>\n<meta name=\"twitter:site\" content=\"@azure\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Hugo Affaticati, Eric Lockard\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/\"},\"author\":[{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/hugo-affaticati\/\",\"@type\":\"Person\",\"@name\":\"Hugo Affaticati\"},{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/eric-lockard\/\",\"@type\":\"Person\",\"@name\":\"Eric Lockard\"}],\"headline\":\"Microsoft Azure delivers game-changing performance for generative AI Inference\",\"datePublished\":\"2024-03-27T19:00:00+00:00\",\"dateModified\":\"2024-03-27T21:21:36+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/\"},\"wordCount\":1062,\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg\",\"articleSection\":[\"AI + machine learning\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/\",\"name\":\"Microsoft Azure delivers game-changing performance for generative AI Inference | Microsoft Azure Blog\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg\",\"datePublished\":\"2024-03-27T19:00:00+00:00\",\"dateModified\":\"2024-03-27T21:21:36+00:00\",\"description\":\"Learn more about how Microsoft delivers performance for Generative AI inference by introducing even more powerful GPUs to the cloud.\",\"breadcrumb\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#primaryimage\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg\",\"width\":1260,\"height\":708,\"caption\":\"Azure Virtual machines icon\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog home\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI + machine learning\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/ai-machine-learning\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Microsoft Azure delivers game-changing performance for generative AI Inference\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"name\":\"Microsoft Azure Blog\",\"description\":\"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.\",\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\",\"name\":\"Microsoft Azure Blog\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"width\":512,\"height\":512,\"caption\":\"Microsoft Azure Blog\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/microsoftazure\",\"https:\/\/x.com\/azure\",\"https:\/\/www.instagram.com\/microsoftdeveloper\/\",\"https:\/\/www.linkedin.com\/company\/16188386\",\"https:\/\/www.youtube.com\/user\/windowsazure\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/c202d869dd6f3cb29ea80999e19313a9\",\"name\":\"Jordan Davis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g4accb07cb584a4dd53673b002bf33930\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g\",\"caption\":\"Jordan Davis\"},\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/jordandavis\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Microsoft Azure delivers game-changing performance for generative AI Inference | Microsoft Azure Blog","description":"Learn more about how Microsoft delivers performance for Generative AI inference by introducing even more powerful GPUs to the cloud.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/","og_locale":"en_US","og_type":"article","og_title":"Microsoft Azure delivers game-changing performance for generative AI Inference | Microsoft Azure Blog","og_description":"Learn more about how Microsoft delivers performance for Generative AI inference by introducing even more powerful GPUs to the cloud.","og_url":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/","og_site_name":"Microsoft Azure Blog","article_publisher":"https:\/\/www.facebook.com\/microsoftazure","article_published_time":"2024-03-27T19:00:00+00:00","article_modified_time":"2024-03-27T21:21:36+00:00","og_image":[{"width":1260,"height":708,"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg","type":"image\/jpeg"}],"author":"Hugo Affaticati, Eric Lockard","twitter_card":"summary_large_image","twitter_image":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg","twitter_creator":"@azure","twitter_site":"@azure","twitter_misc":{"Written by":"Hugo Affaticati, Eric Lockard","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#article","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/"},"author":[{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/hugo-affaticati\/","@type":"Person","@name":"Hugo Affaticati"},{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/eric-lockard\/","@type":"Person","@name":"Eric Lockard"}],"headline":"Microsoft Azure delivers game-changing performance for generative AI Inference","datePublished":"2024-03-27T19:00:00+00:00","dateModified":"2024-03-27T21:21:36+00:00","mainEntityOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/"},"wordCount":1062,"publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg","articleSection":["AI + machine learning"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/","name":"Microsoft Azure delivers game-changing performance for generative AI Inference | Microsoft Azure Blog","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#primaryimage"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg","datePublished":"2024-03-27T19:00:00+00:00","dateModified":"2024-03-27T21:21:36+00:00","description":"Learn more about how Microsoft delivers performance for Generative AI inference by introducing even more powerful GPUs to the cloud.","breadcrumb":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#primaryimage","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/03\/Azure_Blog_3D_Illustration-08_1260x708.jpg","width":1260,"height":708,"caption":"Azure Virtual machines icon"},{"@type":"BreadcrumbList","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-azure-delivers-game-changing-performance-for-generative-ai-inference\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog home","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/"},{"@type":"ListItem","position":2,"name":"AI + machine learning","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/ai-machine-learning\/"},{"@type":"ListItem","position":3,"name":"Microsoft Azure delivers game-changing performance for generative AI Inference"}]},{"@type":"WebSite","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","name":"Microsoft Azure Blog","description":"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.","publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization","name":"Microsoft Azure Blog","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","width":512,"height":512,"caption":"Microsoft Azure Blog"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/microsoftazure","https:\/\/x.com\/azure","https:\/\/www.instagram.com\/microsoftdeveloper\/","https:\/\/www.linkedin.com\/company\/16188386","https:\/\/www.youtube.com\/user\/windowsazure"]},{"@type":"Person","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/c202d869dd6f3cb29ea80999e19313a9","name":"Jordan Davis","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g4accb07cb584a4dd53673b002bf33930","url":"https:\/\/secure.gravatar.com\/avatar\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g","caption":"Jordan Davis"},"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/jordandavis\/"}]}},"msxcm_display_generated_audio":false,"msxcm_animated_featured_image":null,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Azure Blog","distributor_original_site_url":"https:\/\/azure.microsoft.com\/en-us\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/32847","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/users\/45"}],"replies":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/comments?post=32847"}],"version-history":[{"count":0,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/32847\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/media\/32869"}],"wp:attachment":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/media?parent=32847"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/categories?post=32847"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tags?post=32847"},{"taxonomy":"audience","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/audience?post=32847"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/content-type?post=32847"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/product?post=32847"},{"taxonomy":"tech-community","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tech-community?post=32847"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/topic?post=32847"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/coauthors?post=32847"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}