{"id":1142,"date":"2019-07-17T00:00:00","date_gmt":"2019-07-17T07:00:00","guid":{"rendered":"https:\/\/azure.microsoft.com\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale"},"modified":"2025-06-25T02:32:29","modified_gmt":"2025-06-25T09:32:29","slug":"microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale","status":"publish","type":"post","link":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/","title":{"rendered":"Microsoft makes it easier to build popular language representation model BERT at large scale"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\"><em>This post is co-authored by Rangan Majumder, Group Program Manager, Bing and Maxim Lukiyanov, Principal Program Manager, Azure Machine Learning. <\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Today we are announcing the open sourcing of our recipe to pre-train BERT (Bidirectional Encoder Representations from Transformers) built by the Bing team, including code that works on <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/machine-learning-service\/\">Azure Machine Learning<\/a>, so that customers can unlock the power of training custom versions of BERT-large models using their own data. This will enable developers and data scientists to build their own general-purpose language representation beyond BERT.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The area of natural language processing has seen an incredible amount of innovation over the past few years with one of the most recent being BERT.<a href=\"https:\/\/arxiv.org\/abs\/1810.04805\"> BERT<\/a>, a language representation created by Google AI language research, made significant advancements in the ability to capture the intricacies of language and improved the state of the art for many natural language applications, such as text classification, extraction, and question answering. The creation of this new language representation enables developers and data scientists to use BERT as a stepping-stone to solve specialized language tasks and get much better results than when building natural language processing systems from scratch.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The broad applicability of BERT means that most developers and data scientists are able to use a pre-trained variant of BERT rather than building a new version from the ground up with new data. While this is a reasonable solution if the domain\u2019s data is similar to the original model\u2019s data, it will not deliver best-in-class accuracy when crossing over to a new problem space. For example, training a model for the analysis of medical notes requires a deep understanding of the medical domain, providing career recommendations depend on insights from a large corpus of text about jobs and candidates, and legal document processing requires training on legal domain data. In these cases, to maximize the accuracy of the Natural Language Processing (NLP) algorithms one needs to go beyond fine-tuning to pre-training the BERT model.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Additionally, to advance language representation beyond BERT\u2019s accuracy, users will need to change the model architecture, training data, cost function, tasks, and optimization routines. All these changes need to be explored at large parameter and training data sizes. In the case of BERT-large, this can be quite substantial as it has 340 million parameters and trained over 2.5 billion Wikipedia and 800 million BookCorpus words. To support this with Graphical Processing Units (GPUs), the most common hardware used to train deep learning-based NLP models, machine learning engineers will need distributed training support to train these large models. However, due to the complexity and fragility of configuring these distributed environments, even expert tweaking can end up with inferior results from the trained models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">To address these issues, Microsoft is open sourcing a first of a kind, end-to-end recipe for training custom versions of BERT-large models on Azure. Overall this is a stable, predictable recipe that converges to a good optimum for developers and data scientists to try explorations on their own.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>\u201cFine-tuning BERT was really helpful to improve the quality of various tasks important for Bing search relevance,\u201d says Rangan Majumder, Group Program Manager at Bing, who led the open sourcing of this work.&nbsp; \u201cBut there were some tasks where the underlying data was different from the original corpus BERT was pre-trained on, and we wanted to experiment with modifying the tasks and model architecture.&nbsp; In order to enable these explorations, our team of scientists and researchers worked hard to solve how to pre-train BERT on GPUs. We could then build improved representations leading to significantly better accuracy on our internal tasks over BERT.&nbsp; We are excited to open source the work we did at Bing to empower the community to replicate our experiences and extend it in new directions that meet their needs.\u201d<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>\u201cTo get the training to converge to the same quality as the original BERT release on GPUs was non-trivial,\u201d says Saurabh Tiwary, Applied Science Manager at Bing.&nbsp; \u201cTo pre-train BERT we need massive computation and memory, which means we had to distribute the computation across multiple GPUs. However, doing that in a cost effective and efficient way with predictable behaviors in terms of convergence and quality of the final resulting model was quite challenging. We\u2019re releasing the work that we did to simplify the distributed training process so others can benefit from our efforts.\u201d<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"results\">Results<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To test the code, we trained BERT-large model on a standard dataset and reproduced the results of the original paper on a set of GLUE tasks, as shown in Table 1. To give you estimate of the compute required, in our case we ran training on Azure ML cluster of 8xND40_v2 nodes (64 NVidia V100 GPUs total) for 6 days to reach listed accuracy in the table. The actual numbers you will see will vary based on your dataset and your choice of BERT model checkpoint to use for the upstream tasks.<\/p>\n\n\n\n<figure class=\"wp-block-image has-custom-border\"><a href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif\"><img decoding=\"async\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif\" alt=\"clip_image002\" style=\"border-radius:0px\" title=\"clip_image002\" \/><\/a><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\"><em>Table1. GLUE development set results. Google BERT results are evaluated by using published BERT models on development set. The \u201caverage\u201d column is simple average over the table results. F1 scores are reported for QQP and MRPC, Spearman correlations are reported for STS-B, and accuracy scores are reported for the other tasks. The results for tasks with smaller dataset sizes have significant variation and may require multiple fine-tuning runs to reproduce the results.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The code is available in open source on the <a href=\"https:\/\/github.com\/microsoft\/AzureML-BERT\">Azure Machine Learning BERT GitHub repo<\/a>. Included in the repo is:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">A PyTorch implementation of the BERT model from <a href=\"https:\/\/github.com\/huggingface\/pytorch-pretrained-BERT\">Hugging Face repo<\/a>.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Raw and pre-processed English Wikipedia dataset.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Data preparation scripts.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Implementation of optimization techniques such as gradient accumulation and mixed precision.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">An <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/machine-learning-service\/\">Azure Machine Learning service<\/a> Jupyter notebook to launch pre-training of the model.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">A set of pre-trained models that can be used in fine-tuning experiments.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Example code with a notebook to perform fine-tuning experiments.<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">With a simple \u201cRun All\u201d command, developers and data scientists can train their own BERT model using the provided Jupyter notebook in <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/machine-learning-service\/\">Azure Machine Learning service<\/a>. The code, data, scripts, and tooling can also run in any other training environment.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"summary\">Summary<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">We could not have achieved these results without leveraging the amazing work of the researchers before us, and we hope that the community can take our work and go even further. If you have any questions or feedback, please head over to our <a href=\"https:\/\/github.com\/microsoft\/AzureML-BERT\">GitHub repo<\/a> and let us know how we can make it better.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Learn how <a href=\"https:\/\/azure.microsoft.com\/en-us\/services\/machine-learning-service\/\">Azure Machine Learning<\/a> can help you streamline the building, training, and deployment of machine learning models. <a href=\"https:\/\/azure.microsoft.com\/en-us\/free\/services\/machine-learning\/\">Start free today<\/a>.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Today we are announcing the open sourcing of our recipe to pre-train BERT (Bidirectional Encoder Representations from Transformers) built by the Bing team, including code that works on Azure Machine Learning, so that customers can unlock the power of training custom versions of BERT-large models for their organization. This will enable developers and data scientists to build their own general-purpose language representation beyond BERT.<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ms_queue_id":[],"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","footnotes":"","msx_community_cta_settings":[]},"categories":[1454,1485],"tags":[],"audience":[3057,3055,3056],"content-type":[1481],"product":[1493],"tech-community":[],"topic":[],"coauthors":[97],"class_list":["post-1142","post","type-post","status-publish","format-standard","hentry","category-ai-machine-learning","category-internet-of-things","audience-data-professionals","audience-developers","audience-it-implementors","content-type-thought-leadership","product-azure-machine-learning","review-flag-1-1680286581-825","review-flag-2-1680286581-601","review-flag-5-1680286581-950","review-flag-6-1680286581-909","review-flag-free-1680286579-836","review-flag-machi-1680286585-314","review-flag-ml-1680286585-776","review-flag-new-1680286579-546"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Microsoft makes it easier to build popular language representation model BERT at large scale | Microsoft Azure Blog<\/title>\n<meta name=\"description\" content=\"Today we are announcing the open sourcing of our recipe to pre-train BERT (Bidirectional Encoder Representations from Transformers) built by the Bing team, including code that works on Azure Machine Learning, so that customers can unlock the power of training custom versions of BERT-large models for their organization. This will enable developers and data scientists to build their own general-purpose language representation beyond BERT.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Microsoft makes it easier to build popular language representation model BERT at large scale | Microsoft Azure Blog\" \/>\n<meta property=\"og:description\" content=\"Today we are announcing the open sourcing of our recipe to pre-train BERT (Bidirectional Encoder Representations from Transformers) built by the Bing team, including code that works on Azure Machine Learning, so that customers can unlock the power of training custom versions of BERT-large models for their organization. This will enable developers and data scientists to build their own general-purpose language representation beyond BERT.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Azure Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/microsoftazure\" \/>\n<meta property=\"article:published_time\" content=\"2019-07-17T07:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-06-25T09:32:29+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif\" \/>\n<meta name=\"author\" content=\"Microsoft Azure\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@azure\" \/>\n<meta name=\"twitter:site\" content=\"@azure\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Microsoft Azure\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/\"},\"author\":[{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/microsoft-azure\/\",\"@type\":\"Person\",\"@name\":\"Microsoft Azure\"}],\"headline\":\"Microsoft makes it easier to build popular language representation model BERT at large scale\",\"datePublished\":\"2019-07-17T07:00:00+00:00\",\"dateModified\":\"2025-06-25T09:32:29+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/\"},\"wordCount\":1106,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif\",\"articleSection\":[\"AI + machine learning\",\"Internet of things\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/\",\"name\":\"Microsoft makes it easier to build popular language representation model BERT at large scale | Microsoft Azure Blog\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif\",\"datePublished\":\"2019-07-17T07:00:00+00:00\",\"dateModified\":\"2025-06-25T09:32:29+00:00\",\"description\":\"Today we are announcing the open sourcing of our recipe to pre-train BERT (Bidirectional Encoder Representations from Transformers) built by the Bing team, including code that works on Azure Machine Learning, so that customers can unlock the power of training custom versions of BERT-large models for their organization. This will enable developers and data scientists to build their own general-purpose language representation beyond BERT.\",\"breadcrumb\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#primaryimage\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog home\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI + machine learning\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/ai-machine-learning\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Microsoft makes it easier to build popular language representation model BERT at large scale\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"name\":\"Microsoft Azure Blog\",\"description\":\"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.\",\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\",\"name\":\"Microsoft Azure Blog\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"width\":512,\"height\":512,\"caption\":\"Microsoft Azure Blog\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/microsoftazure\",\"https:\/\/x.com\/azure\",\"https:\/\/www.instagram.com\/microsoftdeveloper\/\",\"https:\/\/www.linkedin.com\/company\/16188386\",\"https:\/\/www.youtube.com\/user\/windowsazure\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/c702e5edd662b328b49b7e1180cab117\",\"name\":\"shakir\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g7664e653ea371ce16eaf75e9fa8952c4\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g\",\"caption\":\"shakir\"},\"sameAs\":[\"https:\/\/azure.microsoft.com\"],\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/shakir\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Microsoft makes it easier to build popular language representation model BERT at large scale | Microsoft Azure Blog","description":"Today we are announcing the open sourcing of our recipe to pre-train BERT (Bidirectional Encoder Representations from Transformers) built by the Bing team, including code that works on Azure Machine Learning, so that customers can unlock the power of training custom versions of BERT-large models for their organization. This will enable developers and data scientists to build their own general-purpose language representation beyond BERT.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/","og_locale":"en_US","og_type":"article","og_title":"Microsoft makes it easier to build popular language representation model BERT at large scale | Microsoft Azure Blog","og_description":"Today we are announcing the open sourcing of our recipe to pre-train BERT (Bidirectional Encoder Representations from Transformers) built by the Bing team, including code that works on Azure Machine Learning, so that customers can unlock the power of training custom versions of BERT-large models for their organization. This will enable developers and data scientists to build their own general-purpose language representation beyond BERT.","og_url":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/","og_site_name":"Microsoft Azure Blog","article_publisher":"https:\/\/www.facebook.com\/microsoftazure","article_published_time":"2019-07-17T07:00:00+00:00","article_modified_time":"2025-06-25T09:32:29+00:00","og_image":[{"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif","type":"","width":"","height":""}],"author":"Microsoft Azure","twitter_card":"summary_large_image","twitter_creator":"@azure","twitter_site":"@azure","twitter_misc":{"Written by":"Microsoft Azure","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#article","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/"},"author":[{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/microsoft-azure\/","@type":"Person","@name":"Microsoft Azure"}],"headline":"Microsoft makes it easier to build popular language representation model BERT at large scale","datePublished":"2019-07-17T07:00:00+00:00","dateModified":"2025-06-25T09:32:29+00:00","mainEntityOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/"},"wordCount":1106,"commentCount":0,"publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif","articleSection":["AI + machine learning","Internet of things"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/","name":"Microsoft makes it easier to build popular language representation model BERT at large scale | Microsoft Azure Blog","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#primaryimage"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif","datePublished":"2019-07-17T07:00:00+00:00","dateModified":"2025-06-25T09:32:29+00:00","description":"Today we are announcing the open sourcing of our recipe to pre-train BERT (Bidirectional Encoder Representations from Transformers) built by the Bing team, including code that works on Azure Machine Learning, so that customers can unlock the power of training custom versions of BERT-large models for their organization. This will enable developers and data scientists to build their own general-purpose language representation beyond BERT.","breadcrumb":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#primaryimage","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2019\/07\/18d3c1c7-ae4a-4585-889c-3fb721c77ef3.gif"},{"@type":"BreadcrumbList","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/microsoft-makes-it-easier-to-build-popular-language-representation-model-bert-at-large-scale\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog home","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/"},{"@type":"ListItem","position":2,"name":"AI + machine learning","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/ai-machine-learning\/"},{"@type":"ListItem","position":3,"name":"Microsoft makes it easier to build popular language representation model BERT at large scale"}]},{"@type":"WebSite","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","name":"Microsoft Azure Blog","description":"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.","publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization","name":"Microsoft Azure Blog","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","width":512,"height":512,"caption":"Microsoft Azure Blog"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/microsoftazure","https:\/\/x.com\/azure","https:\/\/www.instagram.com\/microsoftdeveloper\/","https:\/\/www.linkedin.com\/company\/16188386","https:\/\/www.youtube.com\/user\/windowsazure"]},{"@type":"Person","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/c702e5edd662b328b49b7e1180cab117","name":"shakir","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g7664e653ea371ce16eaf75e9fa8952c4","url":"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/9342c7c05bb16548741bc5cd3a3e3b7ee0c8e746844ad2cc582db5beb5514c6f?s=96&d=mm&r=g","caption":"shakir"},"sameAs":["https:\/\/azure.microsoft.com"],"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/shakir\/"}]}},"msxcm_display_generated_audio":false,"msxcm_animated_featured_image":null,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Azure Blog","distributor_original_site_url":"https:\/\/azure.microsoft.com\/en-us\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/1142","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/comments?post=1142"}],"version-history":[{"count":1,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/1142\/revisions"}],"predecessor-version":[{"id":43171,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/1142\/revisions\/43171"}],"wp:attachment":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/media?parent=1142"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/categories?post=1142"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tags?post=1142"},{"taxonomy":"audience","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/audience?post=1142"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/content-type?post=1142"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/product?post=1142"},{"taxonomy":"tech-community","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tech-community?post=1142"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/topic?post=1142"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/coauthors?post=1142"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}