{"id":30284,"date":"2023-11-08T09:00:00","date_gmt":"2023-11-08T17:00:00","guid":{"rendered":"https:\/\/azure.microsoft.com\/en-us\/blog\/?p=30284"},"modified":"2025-06-12T00:34:07","modified_gmt":"2025-06-12T07:34:07","slug":"azure-sets-a-scale-record-in-large-language-model-training","status":"publish","type":"post","link":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/","title":{"rendered":"Azure sets a scale record in large language model training"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/azure.microsoft.com\/en-us\/solutions\/high-performance-computing\/ai-infrastructure\/\">Azure<\/a> empowers intelligent services like <a href=\"https:\/\/adoption.microsoft.com\/en-us\/copilot\/\">Microsoft Copilot<\/a>, <a href=\"https:\/\/www.microsoft.com\/en-us\/bing?ep=0&amp;es=31&amp;form=MA13FV\" target=\"_blank\" rel=\"noreferrer noopener\">Bing<\/a>, and <a href=\"https:\/\/azure.microsoft.com\/products\/ai-services\/\" target=\"_blank\" rel=\"noreferrer noopener\">Azure OpenAI Service<\/a> that have captured our imagination in recent days. These services, facilitating various applications like Microsoft Office 365, chatbots, and search engines with generative AI, owe their magic to large language models (LLMs). While the latest LLMs are transcendental, bringing a generational change in how we apply artificial intelligence in our daily lives and reason about its evolution, we have merely scratched the surface. Creating more capable, fair, foundational LLMs that consume and present information more accurately is necessary.&nbsp;&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-microsoft-maximizes-the-power-of-llms\">How Microsoft maximizes the power of LLMs<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">However, creating new LLMs or improving the accuracy of existing ones is no easy feat. To create and train improved versions of LLMs, supercomputers with massive computational capabilities are required. It is paramount that both the hardware and software in these supercomputers are utilized efficiently at scale, not leaving performance on the table. This is where the sheer scale of the supercomputing infrastructure in Azure cloud shines and setting a new scale record in LLM training matters.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1530\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-1-MLPerf-Nov-2023-1-scaled.jpg\" alt=\"Scale records on the model GPT-3 (175 billion parameters) from MLPerf Training v3.0 in June 2023 (3.0-2003) and Azure on MLPerf Training v3.1 in November 2023 (3.1-2002).\u00a0\" class=\"wp-image-30428\" srcset=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-1-MLPerf-Nov-2023-1-scaled.jpg 2560w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-1-MLPerf-Nov-2023-1-300x179.jpg 300w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-1-MLPerf-Nov-2023-1-1024x612.jpg 1024w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-1-MLPerf-Nov-2023-1-768x459.jpg 768w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-1-MLPerf-Nov-2023-1-1536x918.jpg 1536w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-1-MLPerf-Nov-2023-1-2048x1224.jpg 2048w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 1: Scale records on the model GPT-3 (175 billion parameters) from MLPerf Training v3.0 in June 2023 (3.0-2003) and Azure on MLPerf Training v3.1 in November 2023 (3.1-2002).<\/em>&nbsp;<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Customers need reliable and performant infrastructure to bring the most sophisticated AI use cases to market in record time. Our objective is to build state-of-the-art infrastructure and meet these demands. The <a href=\"https:\/\/mlcommons.org\/en\/training-normal-31\/\" target=\"_blank\" rel=\"noreferrer noopener\">latest MLPerf\u2122 3.1 Training results<\/a><sup>1<\/sup> are a testament to our unwavering commitment to building high-quality and high-performance systems in the cloud to achieve unparalleled efficiency in training LLMs at scale. The idea here is to use massive workloads to stress every component of the system and accelerate our build process to achieve high quality.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The GPT-3 LLM model and its 175 billion parameters were trained to completion in four minutes on 1,344 <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/virtual-machines\/nd-h100-v5-series\" target=\"_blank\" rel=\"noreferrer noopener\">ND H100 v5<\/a> virtual machines (VMs), which represent 10,752 <a href=\"https:\/\/www.nvidia.com\/en-us\/data-center\/h100\/\" target=\"_blank\" rel=\"noreferrer noopener\">NVIDIA H100 Tensor Core GPUs<\/a>, connected by the NVIDIA Quantum-2 InfiniBand networking platform (as shown in Figure 1). This training workload uses close to real-world datasets and restarts from 2.4 terabytes of checkpoints acting closely a production LLM training scenario. The workload stresses the H100 GPUs Tensor Cores, direct-attached Non-Volatile Memory Express disks, and the NVLink interconnect that provides fast communication to the high-bandwidth memory in the GPUs and cross-node 400Gb\/s InfiniBand fabric.&nbsp;<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow has-quote-default-font-size\">\n<p class=\"wp-block-paragraph\">Azure\u2019s submission, the largest in the history of MLPerf Training, demonstrates the extraordinary progress we have made in optimizing the scale of training. MLCommons\u2019 benchmarks showcase the prowess of modern AI infrastructure and software, underlining the continuous advancements that have been achieved, ultimately propelling us toward even more powerful and efficient AI systems<\/p>\n<cite>David Kanter, Executive Director of MLCommons<\/cite><\/blockquote>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"microsoft-s-commitment-to-performance\">Microsoft&#8217;s commitment to performance<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">In March 2023, Microsoft introduced the ND H100 v5-series which completed training a 350 million parameter Bidirectional Encoder Representations from Transformers (BERT) language model in 5.4 minutes, beating our existing record. This resulted in a four times improvement in time to train BERT within just 18 months, highlighting our continuous endeavor to bring the best performance to our users.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1006\" height=\"246\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-2-MLPerf-Nov-2023.jpg\" alt=\"chart\" class=\"wp-image-30340\" srcset=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-2-MLPerf-Nov-2023.jpg 1006w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-2-MLPerf-Nov-2023-300x73.jpg 300w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-2-MLPerf-Nov-2023-768x188.jpg 768w\" sizes=\"auto, (max-width: 1006px) 100vw, 1006px\" \/><figcaption class=\"wp-element-caption\">Figure 2: Relative size of the models BERT (350 million parameters) and GPT-3 (175 billion parameters) from MLPerf Training v3.1.\u00a0\u00a0<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">Today\u2019s results are with GPT-3, a large language model in the MLPerf Training benchmarking suite, featuring 175 billion parameters, a remarkable 500 times larger than the previously benchmarked BERT model (figure 2). The latest training time from Azure reached a 2.7x improvement compared to the previous record from MLPerf Training v3.0. The v3.1 submission underscores the ability to decrease training time and cost by optimizing a model that accurately represents current AI workloads.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"the-power-of-virtualization\">The power of virtualization<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">NVIDIA\u2019s submission to the MLPerf Training v3.1 LLM benchmark on 10,752 NVIDIA H100 Tensor Core GPUs achieved a training time of 3.92 minutes. This amounts to just a 2 percent increase in the training time in Azure VMs compared to the NVIDIA bare-metal submission, which has the best-in-class performance of virtual machines across all offerings of HPC instances in the cloud (figure 3).<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"624\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-3-MLPerf-Nov-2023-updated-1024x624.jpg\" alt=\"chart, treemap chart\" class=\"wp-image-35966\" srcset=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-3-MLPerf-Nov-2023-updated-1024x624.jpg 1024w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-3-MLPerf-Nov-2023-updated-300x183.jpg 300w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-3-MLPerf-Nov-2023-updated-768x468.jpg 768w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-3-MLPerf-Nov-2023-updated-1536x937.jpg 1536w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-3-MLPerf-Nov-2023-updated-2048x1249.jpg 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Figure 3: Relative training times on the model GPT-3 (175 billion parameters) from MLPerf Training v3.1 between the NVIDIA submission on the bare-metal platform (3.1-2007) and Azure on virtual machines (3.1-2002).<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The latest results in AI Inferencing on Azure ND H100 v5 VMs show leadership results as well, as shown in <a href=\"https:\/\/mlcommons.org\/en\/inference-datacenter-31\/\" target=\"_blank\" rel=\"noreferrer noopener\">MLPerf Inference v3.1<\/a>. The ND H100 v5-series delivered 0.99x-1.05x relative performance compared to the bare-metal submissions on the same NVIDIA H100 Tensor Core GPUs (figure 4), echoing the efficiency of virtual machines.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"2560\" height=\"1572\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-4-MLPerf-Nov-2023-2-scaled.jpg\" alt=\"Performance of the ND H100 v5-series (3.1-0003) compared to on-premises and bare metal offerings of the same NVIDIA H100 Tensor Core GPUs (3.1-0107 and 3.1-0121). All the results were obtained with the GPT-J benchmark from MLPerf Inference v3.1, scenarios: Offline and Server, accuracy: 99 percent.\" class=\"wp-image-30434\" srcset=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-4-MLPerf-Nov-2023-2-scaled.jpg 2560w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-4-MLPerf-Nov-2023-2-300x184.jpg 300w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-4-MLPerf-Nov-2023-2-1024x629.jpg 1024w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-4-MLPerf-Nov-2023-2-768x471.jpg 768w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-4-MLPerf-Nov-2023-2-1536x943.jpg 1536w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/Figure-4-MLPerf-Nov-2023-2-2048x1257.jpg 2048w\" sizes=\"auto, (max-width: 2560px) 100vw, 2560px\" \/><figcaption class=\"wp-element-caption\"><em>Figure 4: Performance of the ND H100 v5-series (3.1-0003) compared to on-premises and bare metal offerings of the same NVIDIA H100 Tensor Core GPUs (3.1-0107 and 3.1-0121). All the results were obtained with the GPT-J benchmark from MLPerf Inference v3.1, scenarios: Offline and Server, accuracy: 99<\/em> percent.<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">In conclusion, created for performance, scalability, and adaptability, the Azure ND H100 v5-series offers exceptional throughput and minimal latency for both training and inferencing tasks in the cloud and offers the highest quality infrastructure for AI.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"learn-more-about-azure-ai-infrastructure\">Learn more about Azure AI Infrastructure<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/virtual-machines\/nd-h100-v5-series\" target=\"_blank\" rel=\"noreferrer noopener\">ND H100 v5<\/a>&nbsp;<\/li>\n\n\n\n<li class=\"wp-block-list-item\"><a href=\"https:\/\/aka.ms\/AzureAIInfrastructure\" target=\"_blank\" rel=\"noreferrer noopener\">Azure AI Infrastructure<\/a><\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\" \/>\n\n\n\n<p class=\"wp-block-paragraph\">References <\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">MLCommons\u00ae is an open engineering consortium of AI leaders from academia, research labs, and industry. They build fair and useful benchmarks that provide unbiased evaluations of training and inference performance for hardware, software, and services\u2014all conducted under prescribed conditions. MLPerf\u2122 Training benchmarks consist of real-world compute-intensive AI workloads to best simulate customer\u2019s needs. Tests are transparent and objective, so technology decision-makers can rely on the results to make informed buying decisions.&nbsp;<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>Customers need reliable and performant infrastructure to bring the most sophisticated AI use cases to market in record time. Our objective is to build state-of-the-art infrastructure and meet these demands. The latest MLPerf\u2122 3.1 Training results1 are a testament to our unwavering commitment to building high-quality and high-performance systems in the cloud to achieve unparalleled efficiency in training LLMs at scale.<\/p>\n","protected":false},"author":45,"featured_media":30483,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","ms-ems-related-posts":[],"footnotes":"","azure_community_cta_settings":[]},"categories":[1454],"tags":[3165,3168],"audience":[3057,3055,3053,3056],"content-type":[1497,1481],"product":[1795],"tech-community":[],"coauthors":[1758,2731,2730],"class_list":["post-30284","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-machine-learning","tag-language-models","tag-large-language-models-llms","audience-data-professionals","audience-developers","audience-it-decision-makers","audience-it-implementors","content-type-partnerships","content-type-thought-leadership","product-azure-openai","review-flag-1680286581-295","review-flag-1-1680286581-825","review-flag-2-1680286581-601","review-flag-3-1680286581-173","review-flag-4-1680286581-250","review-flag-5-1680286581-950","review-flag-artif-1680286586-345","review-flag-new-1680286579-546","review-flag-percent"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.4 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Azure sets a scale record in large language model training | Microsoft Azure Blog<\/title>\n<meta name=\"description\" content=\"Learn more about how the Azure ND H100 v5-series offers exceptional throughput and minimal latency for both training and inferencing tasks in the cloud.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Azure sets a scale record in large language model training | Microsoft Azure Blog\" \/>\n<meta property=\"og:description\" content=\"Learn more about how the Azure ND H100 v5-series offers exceptional throughput and minimal latency for both training and inferencing tasks in the cloud.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Azure Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/microsoftazure\" \/>\n<meta property=\"article:published_time\" content=\"2023-11-08T17:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-12T07:34:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/MSC17_dataCenter_020.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1260\" \/>\n\t<meta property=\"og:image:height\" content=\"708\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Kushal Datta, Hugo Affaticati, Hyunseung Harry Yoo\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/MSC17_dataCenter_020.jpg\" \/>\n<meta name=\"twitter:creator\" content=\"@azure\" \/>\n<meta name=\"twitter:site\" content=\"@azure\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kushal Datta, Hugo Affaticati, Hyunseung Harry Yoo\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/\"},\"author\":[{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/author\\\/kushal-datta\\\/\",\"@type\":\"Person\",\"@name\":\"Kushal Datta\"},{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/author\\\/hugo-affaticati\\\/\",\"@type\":\"Person\",\"@name\":\"Hugo Affaticati\"},{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/author\\\/hyunseung-harry-yoo\\\/\",\"@type\":\"Person\",\"@name\":\"Hyunseung Harry Yoo\"}],\"headline\":\"Azure sets a scale record in large language model training\",\"datePublished\":\"2023-11-08T17:00:00+00:00\",\"dateModified\":\"2025-06-12T07:34:07+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/\"},\"wordCount\":919,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/MSC17_dataCenter_020.jpg\",\"keywords\":[\"Language models\",\"Large language models (LLMs)\"],\"articleSection\":[\"AI + machine learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/\",\"url\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/\",\"name\":\"Azure sets a scale record in large language model training | Microsoft Azure Blog\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/MSC17_dataCenter_020.jpg\",\"datePublished\":\"2023-11-08T17:00:00+00:00\",\"dateModified\":\"2025-06-12T07:34:07+00:00\",\"description\":\"Learn more about how the Azure ND H100 v5-series offers exceptional throughput and minimal latency for both training and inferencing tasks in the cloud.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/#primaryimage\",\"url\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/MSC17_dataCenter_020.jpg\",\"contentUrl\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/wp-content\\\/uploads\\\/2023\\\/11\\\/MSC17_dataCenter_020.jpg\",\"width\":1260,\"height\":708,\"caption\":\"Image of a Data center operator\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/azure-sets-a-scale-record-in-large-language-model-training\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog home\",\"item\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI + machine learning\",\"item\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/category\\\/ai-machine-learning\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Azure sets a scale record in large language model training\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/#website\",\"url\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/\",\"name\":\"Microsoft Azure Blog\",\"description\":\"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.\",\"publisher\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/#organization\",\"name\":\"Microsoft Azure Blog\",\"url\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/microsoft_logo.webp\",\"contentUrl\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/wp-content\\\/uploads\\\/2024\\\/06\\\/microsoft_logo.webp\",\"width\":512,\"height\":512,\"caption\":\"Microsoft Azure Blog\"},\"image\":{\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/microsoftazure\",\"https:\\\/\\\/x.com\\\/azure\",\"https:\\\/\\\/www.instagram.com\\\/microsoftdeveloper\\\/\",\"https:\\\/\\\/www.linkedin.com\\\/company\\\/16188386\",\"https:\\\/\\\/www.youtube.com\\\/user\\\/windowsazure\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/#\\\/schema\\\/person\\\/07d2efd4d389f9c33db8b8a2790520d2\",\"name\":\"Jordan Davis\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g4accb07cb584a4dd53673b002bf33930\",\"url\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g\",\"contentUrl\":\"https:\\\/\\\/secure.gravatar.com\\\/avatar\\\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g\",\"caption\":\"Jordan Davis\"},\"url\":\"https:\\\/\\\/azure.microsoft.com\\\/en-us\\\/blog\\\/author\\\/jordandavis\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Azure sets a scale record in large language model training | Microsoft Azure Blog","description":"Learn more about how the Azure ND H100 v5-series offers exceptional throughput and minimal latency for both training and inferencing tasks in the cloud.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/","og_locale":"en_US","og_type":"article","og_title":"Azure sets a scale record in large language model training | Microsoft Azure Blog","og_description":"Learn more about how the Azure ND H100 v5-series offers exceptional throughput and minimal latency for both training and inferencing tasks in the cloud.","og_url":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/","og_site_name":"Microsoft Azure Blog","article_publisher":"https:\/\/www.facebook.com\/microsoftazure","article_published_time":"2023-11-08T17:00:00+00:00","article_modified_time":"2025-06-12T07:34:07+00:00","og_image":[{"width":1260,"height":708,"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/MSC17_dataCenter_020.jpg","type":"image\/jpeg"}],"author":"Kushal Datta, Hugo Affaticati, Hyunseung Harry Yoo","twitter_card":"summary_large_image","twitter_image":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/MSC17_dataCenter_020.jpg","twitter_creator":"@azure","twitter_site":"@azure","twitter_misc":{"Written by":"Kushal Datta, Hugo Affaticati, Hyunseung Harry Yoo","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/#article","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/"},"author":[{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/kushal-datta\/","@type":"Person","@name":"Kushal Datta"},{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/hugo-affaticati\/","@type":"Person","@name":"Hugo Affaticati"},{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/hyunseung-harry-yoo\/","@type":"Person","@name":"Hyunseung Harry Yoo"}],"headline":"Azure sets a scale record in large language model training","datePublished":"2023-11-08T17:00:00+00:00","dateModified":"2025-06-12T07:34:07+00:00","mainEntityOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/"},"wordCount":919,"commentCount":0,"publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/MSC17_dataCenter_020.jpg","keywords":["Language models","Large language models (LLMs)"],"articleSection":["AI + machine learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/","name":"Azure sets a scale record in large language model training | Microsoft Azure Blog","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/#primaryimage"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/MSC17_dataCenter_020.jpg","datePublished":"2023-11-08T17:00:00+00:00","dateModified":"2025-06-12T07:34:07+00:00","description":"Learn more about how the Azure ND H100 v5-series offers exceptional throughput and minimal latency for both training and inferencing tasks in the cloud.","breadcrumb":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/#primaryimage","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/MSC17_dataCenter_020.jpg","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2023\/11\/MSC17_dataCenter_020.jpg","width":1260,"height":708,"caption":"Image of a Data center operator"},{"@type":"BreadcrumbList","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-sets-a-scale-record-in-large-language-model-training\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog home","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/"},{"@type":"ListItem","position":2,"name":"AI + machine learning","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/ai-machine-learning\/"},{"@type":"ListItem","position":3,"name":"Azure sets a scale record in large language model training"}]},{"@type":"WebSite","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","name":"Microsoft Azure Blog","description":"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.","publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization","name":"Microsoft Azure Blog","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","width":512,"height":512,"caption":"Microsoft Azure Blog"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/microsoftazure","https:\/\/x.com\/azure","https:\/\/www.instagram.com\/microsoftdeveloper\/","https:\/\/www.linkedin.com\/company\/16188386","https:\/\/www.youtube.com\/user\/windowsazure"]},{"@type":"Person","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/07d2efd4d389f9c33db8b8a2790520d2","name":"Jordan Davis","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g4accb07cb584a4dd53673b002bf33930","url":"https:\/\/secure.gravatar.com\/avatar\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/ec9971e70dcc01d0fb3aee74bf0f300b2dc40f42a228ed523c90f16cae07c017?s=96&d=mm&r=g","caption":"Jordan Davis"},"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/jordandavis\/"}]}},"bloginabox_animated_featured_image":null,"bloginabox_display_generated_audio":false,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Azure Blog","distributor_original_site_url":"https:\/\/azure.microsoft.com\/en-us\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/30284","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/users\/45"}],"replies":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/comments?post=30284"}],"version-history":[{"count":0,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/30284\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/media\/30483"}],"wp:attachment":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/media?parent=30284"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/categories?post=30284"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tags?post=30284"},{"taxonomy":"audience","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/audience?post=30284"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/content-type?post=30284"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/product?post=30284"},{"taxonomy":"tech-community","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tech-community?post=30284"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/coauthors?post=30284"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}