{"id":7670,"date":"2022-07-26T00:00:00","date_gmt":"2022-07-26T00:00:00","guid":{"rendered":"https:\/\/azure.microsoft.com\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed"},"modified":"2025-06-13T02:50:45","modified_gmt":"2025-06-13T09:50:45","slug":"azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed","status":"publish","type":"post","link":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/","title":{"rendered":"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><i>This blog was written in collaboration with the DeepSpeed team, the Azure ML team, and the Azure HPC team at Microsoft.<\/i><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Large-scale transformer-based deep learning models trained on large amounts of data have shown great results in recent years in several cognitive tasks and are behind new products and features that augment human capabilities. These models have grown several orders of magnitude in size during the last five years. Starting from a few million parameters of the original transformer model all the way to the latest 530 billion-parameter Megatron-Turing (MT-NLG 530B) model as shown in <em>Figure 1<\/em>. There is a growing need for customers to train and fine-tune large models at an unprecedented scale.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp\" alt=\"Hardware is unable to match 200+ times growth in AI models. DeepSpeed enables to scale AI training on thousands of nodes to achieve 4000+ times speedup.\" title=\"Figure 1\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em><strong>Figure 1:<\/strong> Landscape of large models and hardware capabilities.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><a href=\"https:\/\/azure.microsoft.com\/services\/machine-learning\/\">Azure Machine Learning<\/a> (AzureML) brings large fleets of the latest NVIDIA GPUs powered by NVIDIA Quantum&nbsp;InfiniBand interconnects to tackle large-scale AI training. We already train some of the largest models including Megatron\/Turing and GPT-3 on Azure. Previously, to train these models, users needed to set up and maintain a complex distributed training infrastructure that usually required several manual and error-prone steps. This led to a subpar experience both in terms of usability and performance.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Today, we are proud to announce a breakthrough in our software stack, using DeepSpeed and 1024 NVIDIA A100s to scale the training of a 2T parameter model with a streamlined user experience at 1K+ GPU scale. We are bringing these software innovations to you through AzureML (including a fully optimized PyTorch environment) that offers great performance and an easy-to-use interface for large-scale training.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Customers can now use <a href=\"https:\/\/www.deepspeed.ai\/\" target=\"_blank\" rel=\"noopener\">DeepSpeed <\/a>on Azure with simple-to-use training pipelines that utilize either the recommended AzureML <a href=\"https:\/\/github.com\/Azure\/azureml-examples\/tree\/main\/python-sdk\/workflows\/train\/deepspeed\/megatron-deepspeed\" target=\"_blank\" rel=\"noopener\">recipes<\/a> or via bash <a href=\"https:\/\/github.com\/microsoft\/Megatron-DeepSpeed\/tree\/main\/examples\/azure\" target=\"_blank\" rel=\"noopener\">scripts<\/a> for <a href=\"https:\/\/docs.microsoft.com\/en-us\/azure\/virtual-machine-scale-sets\/overview\" target=\"_blank\" rel=\"noopener\">VMSS<\/a>-based environments. As shown in <i>Figure 2<\/i>, Microsoft is taking a full stack optimization approach where all the necessary pieces including the hardware, the OS, the VM image, the Docker image (containing optimized PyTorch, DeepSpeed, ONNX Runtime, and other Python packages), and the user-facing Azure ML APIs have been optimized, integrated, and well-tested for excellent performance and scalability without unnecessary complexity.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/6213ff53-aeda-4961-954e-81692ce0b768.webp\" alt=\"Stack diagram of different layers in Azure AI software.\" title=\"Figure 2\" \/><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em><strong>Figure 2:<\/strong> Microsoft full-stack optimizations for scalable distributed training on Azure.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">This optimized stack enabled us to efficiently scale training of large models using DeepSpeed on Azure. We are happy to share our performance results supporting <b>2x larger model sizes<\/b> (2 trillion vs. 1 trillion parameters), scaling to <b>2x more GPUs<\/b> (1024 vs. 512), and up to <b>1.8x higher compute throughput\/GPU<\/b> (150 TFLOPs vs. 81 TFLOPs) compared to those published on other <a href=\"https:\/\/medium.com\/pytorch\/training-a-1-trillion-parameter-model-with-pytorch-fully-sharded-data-parallel-on-aws-3ac13aa96cff\" target=\"_blank\" rel=\"noopener\">cloud providers<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We offer near-linear scalability both in terms of an <b>increase in model size<\/b> as well as <b>increase in number of GPUs<\/b>. As shown in <i>Figure 3a<\/i>, together with the DeepSpeed <a href=\"https:\/\/www.microsoft.com\/en-us\/research\/blog\/zero-infinity-and-deepspeed-unlocking-unprecedented-model-scale-for-deep-learning-training\/\" target=\"_blank\" rel=\"noopener\">ZeRO-3<\/a>, its novel CPU offloading capabilities, and a high-performance Azure stack powered by InfiniBand Quantum\u00a0interconnects and NVIDIA A100 GPUs, we were able to maintain an efficient throughput\/GPU (&gt;157 TFLOPs) in a near-linear fashion as the model size increased from 175 billion parameters to 2 trillion parameters. On the other hand, for a given model size, for example, 175B, we achieve near-linear scaling as we increase the number of GPUs from 128 all the way to 1024 as shown in <i>Figure 3b<\/i>. The key takeaway from the results presented in this blog is that Azure and DeepSpeed together are breaking the GPU <a href=\"https:\/\/medium.com\/riselab\/ai-and-memory-wall-2cb4265cb0b8\" target=\"_blank\" rel=\"noopener\">memory wall<\/a> and enabling our customers to easily and efficiently train trillion-parameter models at scale.<\/p>\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/c81b6e6d-86c0-4a45-b7a3-f1445c8e041f.webp\" alt=\"Training throughput scales linearly with number of GPUs exhibiting near-perfect scaling efficiency on 1K GPUs.\" class=\"wp-image-22498 webp-format\" data-orig-src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/c81b6e6d-86c0-4a45-b7a3-f1445c8e041f.webp\"><\/figure>\n\n\n<figure class=\"wp-block-image size-full\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/74ceaf10-9438-4d5e-9d54-748a19775d83.webp\" alt=\"Training throughput scales linearly with number of GPUs exhibiting near-perfect scaling efficiency on 1K GPUs.\" class=\"wp-image-22500 webp-format\" data-orig-src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/74ceaf10-9438-4d5e-9d54-748a19775d83.webp\"><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">(a)&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; (b)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em><strong><em><strong>Figure 3:<\/strong><\/em><\/strong><em><strong> (a)<\/strong> Near-perfect throughput\/GPU as we increase the model size from 175 billion to 2 trillion parameters (BS\/GPU=8), <\/em><\/em><em style=\"font-style: italic\"><strong>(b)<\/strong> Near-perfect performance scaling with the increase in number of GPU devices for the 175B model (BS\/GPU=16). The sequence length is 1024 for both cases.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"learn-more\">Learn more<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To learn more about the optimizations, technologies, and detailed performance trends presented above, please refer to our extended <a href=\"https:\/\/aka.ms\/mds-azure\" target=\"_blank\" rel=\"noopener\">technical blog<\/a>.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Learn more about <a href=\"https:\/\/www.deepspeed.ai\/\" target=\"_blank\" rel=\"noopener\">DeepSpeed<\/a>, which is part of Microsoft\u2019s <a href=\"https:\/\/www.microsoft.com\/research\/project\/ai-at-scale\/\" target=\"_blank\" rel=\"noopener\">AI at Scale<\/a> initiative.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Learn more about <a href=\"https:\/\/www.azure.com\/hpc\">Azure HPC + AI<\/a>.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">To get started with DeepSpeed on Azure, please follow our <a href=\"https:\/\/www.deepspeed.ai\/tutorials\/azure\/\" target=\"_blank\" rel=\"noopener\">getting started tutorial<\/a>.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">The results presented in this blog were produced on Azure by following the recipes and scripts published as part of the <a href=\"https:\/\/github.com\/microsoft\/Megatron-DeepSpeed\" target=\"_blank\" rel=\"noopener\">Megatron-DeepSpeed<\/a> repository. The recommended and most easy-to-use method to run the training experiments is to utilize the <a href=\"https:\/\/github.com\/Azure\/azureml-examples\/tree\/main\/python-sdk\/workflows\/train\/deepspeed\/megatron-deepspeed\" target=\"_blank\" rel=\"noopener\">AzureML recipe<\/a>.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">If you are running experiments on a custom environment built using Azure VMs or VMSS, please refer to the <a href=\"https:\/\/github.com\/microsoft\/Megatron-DeepSpeed\/tree\/main\/examples\/azure\" target=\"_blank\" rel=\"noopener\">bash scripts<\/a> we provide in Megatron-DeepSpeed.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Large-scale transformer-based deep learning models trained on large amounts of data have shown great results in recent years in several cognitive tasks and are behind new products and features that augment human capabilities. Azure Machine Learning (AzureML) brings large fleets of the latest GPUs powered by the InfiniBand interconnect to tackle large-scale AI training.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ms_queue_id":[],"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","footnotes":"","msx_community_cta_settings":[]},"categories":[1454,1485],"tags":[],"audience":[3057,3055,3056],"content-type":[1481],"product":[1493],"tech-community":[],"topic":[],"coauthors":[1758],"class_list":["post-7670","post","type-post","status-publish","format-standard","hentry","category-ai-machine-learning","category-internet-of-things","audience-data-professionals","audience-developers","audience-it-implementors","content-type-thought-leadership","product-azure-machine-learning","review-flag-1680286581-295","review-flag-1680286581-364","review-flag-1-1680286581-825","review-flag-2-1680286581-601","review-flag-3-1680286581-173","review-flag-8-1680286581-263","review-flag-machi-1680286585-314","review-flag-microsofts","review-flag-ml-1680286585-776","review-flag-new-1680286579-546","review-flag-vm-1680286585-143"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed | Microsoft Azure Blog<\/title>\n<meta name=\"description\" content=\"Large-scale transformer-based deep learning models trained on large amounts of data have shown great results in recent years in several cognitive tasks and are behind new products and features that augment human capabilities. Azure Machine Learning (AzureML) brings large fleets of the latest GPUs powered by the InfiniBand interconnect to tackle large-scale AI training.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed | Microsoft Azure Blog\" \/>\n<meta property=\"og:description\" content=\"Large-scale transformer-based deep learning models trained on large amounts of data have shown great results in recent years in several cognitive tasks and are behind new products and features that augment human capabilities. Azure Machine Learning (AzureML) brings large fleets of the latest GPUs powered by the InfiniBand interconnect to tackle large-scale AI training.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Azure Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/microsoftazure\" \/>\n<meta property=\"article:published_time\" content=\"2022-07-26T00:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-13T09:50:45+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp\" \/>\n<meta name=\"author\" content=\"Kushal Datta\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@azure\" \/>\n<meta name=\"twitter:site\" content=\"@azure\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Kushal Datta\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/\"},\"author\":[{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/kushal-datta\/\",\"@type\":\"Person\",\"@name\":\"Kushal Datta\"}],\"headline\":\"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed\",\"datePublished\":\"2022-07-26T00:00:00+00:00\",\"dateModified\":\"2025-06-13T09:50:45+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/\"},\"wordCount\":819,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp\",\"articleSection\":[\"AI + machine learning\",\"Internet of things\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/\",\"name\":\"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed | Microsoft Azure Blog\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp\",\"datePublished\":\"2022-07-26T00:00:00+00:00\",\"dateModified\":\"2025-06-13T09:50:45+00:00\",\"description\":\"Large-scale transformer-based deep learning models trained on large amounts of data have shown great results in recent years in several cognitive tasks and are behind new products and features that augment human capabilities. Azure Machine Learning (AzureML) brings large fleets of the latest GPUs powered by the InfiniBand interconnect to tackle large-scale AI training.\",\"breadcrumb\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#primaryimage\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog home\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI + machine learning\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/ai-machine-learning\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"name\":\"Microsoft Azure Blog\",\"description\":\"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.\",\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\",\"name\":\"Microsoft Azure Blog\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"width\":512,\"height\":512,\"caption\":\"Microsoft Azure Blog\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/microsoftazure\",\"https:\/\/x.com\/azure\",\"https:\/\/www.instagram.com\/microsoftdeveloper\/\",\"https:\/\/www.linkedin.com\/company\/16188386\",\"https:\/\/www.youtube.com\/user\/windowsazure\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/c702e5edd662b328b49b7e1180cab117\",\"name\":\"shakir\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g7664e653ea371ce16eaf75e9fa8952c4\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g\",\"caption\":\"shakir\"},\"sameAs\":[\"https:\/\/azure.microsoft.com\"],\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/shakir\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed | Microsoft Azure Blog","description":"Large-scale transformer-based deep learning models trained on large amounts of data have shown great results in recent years in several cognitive tasks and are behind new products and features that augment human capabilities. Azure Machine Learning (AzureML) brings large fleets of the latest GPUs powered by the InfiniBand interconnect to tackle large-scale AI training.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/","og_locale":"en_US","og_type":"article","og_title":"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed | Microsoft Azure Blog","og_description":"Large-scale transformer-based deep learning models trained on large amounts of data have shown great results in recent years in several cognitive tasks and are behind new products and features that augment human capabilities. Azure Machine Learning (AzureML) brings large fleets of the latest GPUs powered by the InfiniBand interconnect to tackle large-scale AI training.","og_url":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/","og_site_name":"Microsoft Azure Blog","article_publisher":"https:\/\/www.facebook.com\/microsoftazure","article_published_time":"2022-07-26T00:00:00+00:00","article_modified_time":"2025-06-13T09:50:45+00:00","og_image":[{"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp","type":"","width":"","height":""}],"author":"Kushal Datta","twitter_card":"summary_large_image","twitter_creator":"@azure","twitter_site":"@azure","twitter_misc":{"Written by":"Kushal Datta","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#article","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/"},"author":[{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/kushal-datta\/","@type":"Person","@name":"Kushal Datta"}],"headline":"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed","datePublished":"2022-07-26T00:00:00+00:00","dateModified":"2025-06-13T09:50:45+00:00","mainEntityOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/"},"wordCount":819,"commentCount":0,"publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp","articleSection":["AI + machine learning","Internet of things"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/","name":"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed | Microsoft Azure Blog","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#primaryimage"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp","datePublished":"2022-07-26T00:00:00+00:00","dateModified":"2025-06-13T09:50:45+00:00","description":"Large-scale transformer-based deep learning models trained on large amounts of data have shown great results in recent years in several cognitive tasks and are behind new products and features that augment human capabilities. Azure Machine Learning (AzureML) brings large fleets of the latest GPUs powered by the InfiniBand interconnect to tackle large-scale AI training.","breadcrumb":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#primaryimage","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2022\/07\/856b592e-756f-48fd-9bd1-198dea75ccc5.webp"},{"@type":"BreadcrumbList","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/azure-empowers-easytouse-highperformance-and-hyperscale-model-training-using-deepspeed\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog home","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/"},{"@type":"ListItem","position":2,"name":"AI + machine learning","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/ai-machine-learning\/"},{"@type":"ListItem","position":3,"name":"Azure empowers easy-to-use, high-performance, and hyperscale model training using DeepSpeed"}]},{"@type":"WebSite","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","name":"Microsoft Azure Blog","description":"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.","publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization","name":"Microsoft Azure Blog","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","width":512,"height":512,"caption":"Microsoft Azure Blog"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/microsoftazure","https:\/\/x.com\/azure","https:\/\/www.instagram.com\/microsoftdeveloper\/","https:\/\/www.linkedin.com\/company\/16188386","https:\/\/www.youtube.com\/user\/windowsazure"]},{"@type":"Person","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/c702e5edd662b328b49b7e1180cab117","name":"shakir","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g7664e653ea371ce16eaf75e9fa8952c4","url":"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g","caption":"shakir"},"sameAs":["https:\/\/azure.microsoft.com"],"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/shakir\/"}]}},"msxcm_display_generated_audio":false,"msxcm_animated_featured_image":null,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Azure Blog","distributor_original_site_url":"https:\/\/azure.microsoft.com\/en-us\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/7670","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/comments?post=7670"}],"version-history":[{"count":1,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/7670\/revisions"}],"predecessor-version":[{"id":41744,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/7670\/revisions\/41744"}],"wp:attachment":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/media?parent=7670"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/categories?post=7670"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tags?post=7670"},{"taxonomy":"audience","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/audience?post=7670"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/content-type?post=7670"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/product?post=7670"},{"taxonomy":"tech-community","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tech-community?post=7670"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/topic?post=7670"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/coauthors?post=7670"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}