{"id":38030,"date":"2025-01-08T08:00:00","date_gmt":"2025-01-08T16:00:00","guid":{"rendered":""},"modified":"2025-09-10T12:54:15","modified_gmt":"2025-09-10T19:54:15","slug":"boost-processing-performance-by-combining-ai-models","status":"publish","type":"post","link":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/","title":{"rendered":"Boost processing performance by combining AI models"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Leveraging the strengths of different AI models and bringing them together into a single application can be a great strategy to help you meet your performance objectives. This approach harnesses the power of multiple AI systems to improve accuracy and reliability in complex scenarios.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the Microsoft model catalog, there are more than 1,800 AI models available. Even more models and services are available via<strong> <\/strong>Azure OpenAI Service and Azure AI Foundry, so you can find the right models to build your optimal AI solution.&nbsp;<\/p>\n\n\n\n<aside class=\"cta-block cta-block--align-left cta-block--has-image wp-block-msx-cta\" data-bi-an=\"CTA Block\">\n\t<div class=\"cta-block__content\">\n\t\t\t\t\t<div class=\"cta-block__image-container\">\n\t\t\t\t<img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"575\" src=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/10\/Azure_Hero_Cylinder_Blue_GreenGrad-1024x575.webp\" class=\"cta-block__image\" alt=\"background pattern\" srcset=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/10\/Azure_Hero_Cylinder_Blue_GreenGrad-1024x575.webp 1024w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/10\/Azure_Hero_Cylinder_Blue_GreenGrad-300x169.webp 300w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/10\/Azure_Hero_Cylinder_Blue_GreenGrad-768x432.webp 768w, https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/10\/Azure_Hero_Cylinder_Blue_GreenGrad.webp 1260w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/>\t\t\t<\/div>\n\t\t\n\t\t<div class=\"cta-block__body\">\n\t\t\t<h2 class=\"cta-block__headline\">Azure OpenAI Service<\/h2>\n\t\t\t<p class=\"cta-block__text\">Customize various models for your specific use cases.<\/p>\n\t\t\t\t\t\t\t<div class=\"cta-block__actions\">\n\t\t\t\t\t<a\n\t\t\t\t\t\thref=\"https:\/\/azure.microsoft.com\/en-us\/products\/ai-services\/openai-service\/\"\n\t\t\t\t\t\tclass=\"btn cta-block__link btn-link\"\n\t\t\t\t\t\t\t\t\t\t\t>\n\t\t\t\t\t\tTry it out\t\t\t\t\t<\/a>\n\t\t\t\t<\/div>\n\t\t\t\t\t<\/div>\n\t<\/div>\n<\/aside>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s look at how a multiple model approach works and explore some scenarios where companies successfully implemented this approach to increase performance and reduce costs.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"how-the-multiple-model-approach-works\">How the multiple model approach works<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">The multiple model approach involves combining different AI models to solve complex tasks more effectively. Models are trained for different tasks or aspects of a problem, such as language understanding, image recognition, or data analysis. Models can work in parallel and process different parts of the input data simultaneously, route to relevant models, or be used in different ways in an application.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s suppose you want to pair a fine-tuned vision model with a large language model to perform several complex imaging classification tasks in conjunction with natural language queries. Or maybe you have a small model fine-tuned to generate SQL queries on your database schema, and you\u2019d like to pair it with a larger model for more general-purpose tasks such as information retrieval and research assistance. In both of these cases, the multiple model approach could offer you the adaptability to build a comprehensive AI solution that fits your organization\u2019s particular requirements.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"before-implementing-a-multiple-model-strategy\"><strong>Before implementing a multiple model strategy<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">First, identify and understand the outcome you want to achieve, as this is key to selecting and deploying the right AI models. In addition, each model has its own set of merits and challenges to consider in order to ensure you choose the right ones for your goals. There are several items to consider before implementing a multiple model strategy, including:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">The intended purpose of the models.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">The application\u2019s requirements around model size.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Training and management of specialized models.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">The varying degrees of accuracy needed.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Governance of the application and models.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Security and bias of potential models.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">Cost of models and expected cost at scale.<\/li>\n\n\n\n<li class=\"wp-block-list-item\">The right programming language (check <a href=\"https:\/\/github.com\/symflower\/eval-dev-quality\" target=\"_blank\" rel=\"noreferrer noopener\">DevQualityEval<\/a> for current information on the best languages to use with specific models).<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">The weight you give to each criterion will depend on factors such as your objectives, tech stack, resources, and other variables specific to your organization.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Let\u2019s look at some scenarios as well as a few customers who have implemented multiple models into their workflows.<\/p>\n\n\n\n<aside class=\"wp-block-msx-kicker-container\">\n\t<div class=\"wp-block-msx-kicker wp-block-msx-kicker--align-right\" data-bi-an=\"Kicker Right\">\n\t\t<p class=\"wp-block-msx-kicker__title\">Multiple model implementations<\/p>\n\t\t<a\n\t\t\tclass=\"wp-block-msx-kicker__cta btn btn-link\"\n\t\t\thref=\"https:\/\/azure.microsoft.com\/en-us\/free\/virtual-network\/search\/?ef_id=_k_b95503a83b5a11f0161364d4342f531d_k_&#038;OCID=AIDcmm5edswduu_SEM__k_b95503a83b5a11f0161364d4342f531d_k_&#038;msclkid=b95503a83b5a11f0161364d4342f531d\"\n\t\t\ttarget=\"_blank\"\t\t>\n\t\t\t<span>Create a private virtual network in the cloud with your Azure free account<\/span> <span class=\"glyph-append glyph-append-xsmall wp-block-msx-kicker__glyph glyph-append-go\"><\/span>\n\t\t<\/a>\n\t<\/div>\n<\/aside>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"scenario-1-routing\">Scenario 1: Routing<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Routing is when AI and machine learning technologies optimize the most efficient paths for use cases such as call centers, logistics, and more. Here are a few examples:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"multimodal-routing-for-diverse-data-processing\"><strong>Multimodal routing for diverse data processing<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">One innovative application of multiple model processing is to route tasks simultaneously through different multimodal models that specialize in processing specific data types such as text, images, sound, and video. For example, you can use a combination of a smaller model like GPT-3.5 turbo, with a multimodal large language model like <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-services\/openai\/concepts\/models#gpt-4o-and-gpt-4-turbo\" target=\"_blank\" rel=\"noreferrer noopener\">GPT-4o<\/a>, depending on the modality. This routing allows an application to process multiple modalities by directing each type of data to the model best suited for it, thus enhancing the system\u2019s overall performance and versatility.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"expert-routing-for-specialized-domains\"><strong>Expert routing for specialized domains<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Another example is expert routing, where prompts are directed to specialized models, or \u201cexperts,\u201d based on the specific area or field referenced in the task. By implementing expert routing, companies ensure that different types of user queries are handled by the most suitable AI model or service. For instance, technical support questions might be directed to a model trained on technical documentation and support tickets, while general information requests might be handled by a more general-purpose language model.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">&nbsp;Expert routing can be particularly useful in fields such as medicine, where different models can be fine-tuned to handle particular topics or images. Instead of relying on a single large model, multiple smaller models such as <a href=\"https:\/\/ai.azure.com\/explore\/models\/Phi-3.5-mini-instruct\/version\/6\/registry\/azureml?tid=72f988bf-86f1-41af-91ab-2d7cd011db47\">Phi-3.5-mini-instruct<\/a> and <a href=\"https:\/\/ai.azure.com\/explore\/models\/Phi-3.5-vision-instruct\/version\/2\/registry\/azureml?tid=72f988bf-86f1-41af-91ab-2d7cd011db47\">Phi-3.5-vision-instruct<\/a> might be used\u2014each optimized for a defined area like chat or vision, so that each query is handled by the most appropriate expert model, thereby enhancing the precision and relevance of the model\u2019s output. This approach can improve response accuracy and reduce costs associated with fine-tuning large models.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"auto-manufacturer\"><strong>Auto manufacturer<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">One example of this type of routing comes from a large auto manufacturer. They implemented a Phi model to process most basic tasks quickly while simultaneously routing more complicated tasks to a large language model like <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-services\/openai\/concepts\/models#gpt-4o-and-gpt-4-turbo\">GPT-4o<\/a>. The Phi-3 offline model quickly handles most of the data processing locally, while the GPT online model provides the processing power for larger, more complex queries. This combination helps take advantage of the cost-effective capabilities of Phi-3, while ensuring that more complex, business-critical queries are processed effectively.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"sage\"><strong>Sage<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Another example demonstrates how industry-specific use cases can benefit from expert routing. Sage, a leader in accounting, finance, human resources, and payroll technology for small and medium-sized businesses (SMBs), wanted to help their customers discover efficiencies in accounting processes and boost productivity through AI-powered services that could automate routine tasks and provide real-time insights.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Recently, Sage deployed Mistral, a commercially available large language model, and fine-tuned it with accounting-specific data to address gaps in the GPT-4 model used for their Sage Copilot. This fine-tuning allowed Mistral to better understand and respond to accounting-related queries so it could categorize user questions more effectively and then route them to the appropriate agents or deterministic systems. For instance, while the out-of-the-box Mistral large language model might struggle with a cash-flow forecasting question, the fine-tuned version could accurately direct the query through both Sage-specific and domain-specific data, ensuring a precise and relevant response for the user.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"scenario-2-online-and-offline-use\">Scenario 2: Online and offline use<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Online and offline scenarios allow for the dual benefits of storing and processing information locally with an offline AI model, as well as using an online AI model to access globally available data. In this setup, an organization could run a local model for specific tasks on devices (such as a customer service chatbot), while still having access to an online model that could provide data within a broader context.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"hybrid-model-deployment-for-healthcare-diagnostics\"><strong>Hybrid model deployment for healthcare diagnostics<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In the healthcare sector, AI models could be deployed in a hybrid manner to provide both online and offline capabilities. In one example, a hospital could use an offline AI model to handle initial diagnostics and data processing locally in IoT devices. Simultaneously, an online AI model could be employed to access the latest medical research from cloud-based databases and medical journals. While the offline model processes patient information locally, the online model provides globally available medical data. This online and offline combination helps ensure that staff can effectively conduct their patient assessments while still benefiting from access to the latest advancements in medical research.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"smart-home-systems-with-local-and-cloud-ai\"><strong>Smart-home systems with local and cloud AI<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">In smart-home systems, multiple AI models can be used to manage both online and offline tasks. An offline AI model can be embedded within the home network to control basic functions such as lighting, temperature, and security systems, enabling a quicker response and allowing essential services to operate even during internet outages. Meanwhile, an online AI model can be used for tasks that require access to cloud-based services for updates and advanced processing, such as voice recognition and smart-device integration. This dual approach allows smart home systems to maintain basic operations independently while leveraging cloud capabilities for enhanced features and updates.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"scenario-3-combining-task-specific-and-larger-models\">Scenario 3: Combining task-specific and larger models<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Companies looking to optimize cost savings could consider combining a <a href=\"https:\/\/news.microsoft.com\/source\/features\/ai\/the-phi-3-small-language-models-with-big-potential\/\" target=\"_blank\" rel=\"noreferrer noopener\">small but powerful<\/a> task-specific SLM like <a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/phi\/\">Phi-3<\/a> with a robust large language model. One way this could work is by deploying Phi-3\u2014one of <a href=\"https:\/\/azure.microsoft.com\/en-us\/products\/phi\/\">Microsoft\u2019s family of powerful, small language models<\/a> with groundbreaking performance at low cost and low latency\u2014in edge computing scenarios or applications with stricter latency requirements, together with the processing power of a larger model like GPT.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Additionally, Phi-3 could serve as an initial filter or triage system, handling straightforward queries and only escalating more nuanced or challenging requests to GPT models. This tiered approach helps to optimize workflow efficiency and reduce unnecessary use of more expensive models.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">By thoughtfully building a setup of complementary small and large models, businesses can potentially achieve cost-effective performance tailored to their specific use cases.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"capacity\"><strong>Capacity<\/strong><\/h3>\n\n\n\n<p class=\"wp-block-paragraph\">Capacity\u2019s <a href=\"https:\/\/www.microsoft.com\/en\/customers\/story\/1700954751530838723-lucy-azure-united-states\" target=\"_blank\" rel=\"noreferrer noopener\">AI-powered Answer Engine<\/a>\u00ae retrieves exact answers for users in seconds. By leveraging cutting-edge AI technologies, Capacity gives organizations a personalized AI research assistant that can seamlessly scale across all teams and departments. They needed a way to help unify diverse datasets and make information more easily accessible and understandable for their customers. By leveraging Phi, Capacity was able to provide enterprises with an effective AI knowledge-management solution that enhances information accessibility, security, and operational efficiency, saving customers time and hassle.&nbsp;Following the successful implementation of Phi-3-Medium, Capacity is now eagerly testing the Phi-3.5-MOE model for use in production.<\/p>\n\n\n\n<aside class=\"wp-block-msx-kicker-container\">\n\t<div class=\"wp-block-msx-kicker wp-block-msx-kicker--align-right\" data-bi-an=\"Kicker Right\">\n\t\t<p class=\"wp-block-msx-kicker__title\">Phi Open Models<\/p>\n\t\t<a\n\t\t\tclass=\"wp-block-msx-kicker__cta btn btn-link\"\n\t\t\thref=\"https:\/\/azure.microsoft.com\/en-us\/products\/phi\/\"\n\t\t\ttarget=\"_blank\"\t\t>\n\t\t\t<span>Smaller, less compute-intensive models for generative AI solutions.<\/span> <span class=\"glyph-append glyph-append-xsmall wp-block-msx-kicker__glyph glyph-append-go\"><\/span>\n\t\t<\/a>\n\t<\/div>\n<\/aside>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"our-commitment-to-trustworthy-ai\">Our commitment to Trustworthy AI<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">Organizations across industries are leveraging Azure AI and Copilot capabilities to drive growth, increase productivity, and create value-added experiences.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">We\u2019re committed to helping organizations use and build <a href=\"https:\/\/blogs.microsoft.com\/blog\/2024\/09\/24\/microsoft-trustworthy-ai-unlocking-human-potential-starts-with-trust\/\" target=\"_blank\" rel=\"noreferrer noopener\">AI that is trustworthy<\/a>, meaning it is secure, private, and safe. We bring best practices and learnings from decades of researching and building AI products at scale to provide industry-leading commitments and capabilities that span our three pillars of security, privacy, and safety. Trustworthy AI is only possible when you combine our commitments, such as our Secure Future Initiative and our Responsible AI principles, with our product capabilities to unlock AI transformation with confidence.&nbsp;<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"get-started-with-azure-ai-foundry\">Get started with Azure AI Foundry<\/h2>\n\n\n\n<p class=\"wp-block-paragraph\">To learn more about enhancing the reliability, security, and performance of your cloud and AI investments, explore the additional resources below.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Find the ideal AI model at <a href=\"https:\/\/ai.azure.com\/\" target=\"_blank\" rel=\"noreferrer noopener\">Azure AI Foundry<\/a>.&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Learn more about <a href=\"https:\/\/learn.microsoft.com\/en-us\/azure\/ai-services\/openai\/concepts\/models?tabs=python-secure%2Cglobal-standard%2Cstandard-chat-completions\" target=\"_blank\" rel=\"noreferrer noopener\">Azure OpenAI Service models<\/a>.&nbsp;<\/li>\n<\/ul>\n\n\n\n<ul class=\"wp-block-list\">\n<li class=\"wp-block-list-item\">Read about <a href=\"https:\/\/export.arxiv.org\/abs\/2404.14219\" target=\"_blank\" rel=\"noreferrer noopener\">Phi-3-mini<\/a>, which performs better than some models twice its size.&nbsp;<\/li>\n<\/ul>\n\n\n\n<div class=\"wp-block-buttons is-content-justification-center is-layout-flex wp-container-core-buttons-is-layout-a89b3969 wp-block-buttons-is-layout-flex\">\n<div class=\"wp-block-button\"><a class=\"wp-block-button__link wp-element-button\" href=\"https:\/\/azure.microsoft.com\/en-us\/products\/ai-services\/openai-service\/\">Build custom generative AI solutions with Azure OpenAI<\/a><\/div>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>Look at how a multiple model approach works and companies successfully implemented this approach to increase performance and reduce costs.<\/p>\n","protected":false},"author":39,"featured_media":38066,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"ms_queue_id":[],"ep_exclude_from_search":false,"_classifai_error":"","_classifai_text_to_speech_error":"","_alt_title":"","footnotes":"","msx_community_cta_settings":[]},"categories":[1454],"tags":[2671,3165,3168,3167],"audience":[3072],"content-type":[1511,1527],"product":[1803,2758,1795,3164],"tech-community":[3044],"topic":[],"coauthors":[1719],"class_list":["post-38030","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-machine-learning","tag-ai","tag-language-models","tag-large-language-models-llms","tag-small-language-models-slms","audience-ai-professionals","content-type-best-practices","content-type-customer-stories","product-azure-ai","product-azure-ai-studio","product-azure-openai","product-microsoft-foundry","review-flag-1-1680286581-825","review-flag-2-1680286581-601","review-flag-3-1680286581-173","review-flag-4-1680286581-250","review-flag-5-1680286581-950","review-flag-free-1680286579-836","review-flag-iot-1680286585-835","review-flag-machi-1680286585-314","review-flag-microsofts"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.2 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Boost processing performance by combining AI models | Microsoft Azure Blog<\/title>\n<meta name=\"description\" content=\"Look at how a multiple model approach works and companies successfully implemented this approach to increase performance and reduce costs.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Boost processing performance by combining AI models | Microsoft Azure Blog\" \/>\n<meta property=\"og:description\" content=\"Look at how a multiple model approach works and companies successfully implemented this approach to increase performance and reduce costs.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/\" \/>\n<meta property=\"og:site_name\" content=\"Microsoft Azure Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/microsoftazure\" \/>\n<meta property=\"article:published_time\" content=\"2025-01-08T16:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-09-10T19:54:15+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1261\" \/>\n\t<meta property=\"og:image:height\" content=\"708\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Olivia Shone\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@azure\" \/>\n<meta name=\"twitter:site\" content=\"@azure\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Olivia Shone\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/\"},\"author\":[{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/olivia-shone\/\",\"@type\":\"Person\",\"@name\":\"Olivia Shone\"}],\"headline\":\"Boost processing performance by combining AI models\",\"datePublished\":\"2025-01-08T16:00:00+00:00\",\"dateModified\":\"2025-09-10T19:54:15+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/\"},\"wordCount\":1709,\"commentCount\":0,\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.webp\",\"keywords\":[\"AI\",\"Language models\",\"Large language models (LLMs)\",\"Small language models (SLMs)\"],\"articleSection\":[\"AI + machine learning\"],\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"CommentAction\",\"name\":\"Comment\",\"target\":[\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#respond\"]}]},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/\",\"name\":\"Boost processing performance by combining AI models | Microsoft Azure Blog\",\"isPartOf\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.webp\",\"datePublished\":\"2025-01-08T16:00:00+00:00\",\"dateModified\":\"2025-09-10T19:54:15+00:00\",\"description\":\"Look at how a multiple model approach works and companies successfully implemented this approach to increase performance and reduce costs.\",\"breadcrumb\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#primaryimage\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.webp\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.webp\",\"width\":1261,\"height\":708,\"caption\":\"Man looking at laptop\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Blog home\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI + machine learning\",\"item\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/ai-machine-learning\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Boost processing performance by combining AI models\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#website\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"name\":\"Microsoft Azure Blog\",\"description\":\"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.\",\"publisher\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization\",\"name\":\"Microsoft Azure Blog\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"contentUrl\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp\",\"width\":512,\"height\":512,\"caption\":\"Microsoft Azure Blog\"},\"image\":{\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/microsoftazure\",\"https:\/\/x.com\/azure\",\"https:\/\/www.instagram.com\/microsoftdeveloper\/\",\"https:\/\/www.linkedin.com\/company\/16188386\",\"https:\/\/www.youtube.com\/user\/windowsazure\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/dddfb06db704f28e44dc633b15e0d6ae\",\"name\":\"Brianna McGovern\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/secure.gravatar.com\/avatar\/471211b4d059ccb73aa3fda768b31973fb946424996c0376f7f0be3cb919d469?s=96&d=mm&r=g5fc6a76f72449f78acaf535ec3e0c54f\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/471211b4d059ccb73aa3fda768b31973fb946424996c0376f7f0be3cb919d469?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/471211b4d059ccb73aa3fda768b31973fb946424996c0376f7f0be3cb919d469?s=96&d=mm&r=g\",\"caption\":\"Brianna McGovern\"},\"url\":\"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/briannamcgovern\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Boost processing performance by combining AI models | Microsoft Azure Blog","description":"Look at how a multiple model approach works and companies successfully implemented this approach to increase performance and reduce costs.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/","og_locale":"en_US","og_type":"article","og_title":"Boost processing performance by combining AI models | Microsoft Azure Blog","og_description":"Look at how a multiple model approach works and companies successfully implemented this approach to increase performance and reduce costs.","og_url":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/","og_site_name":"Microsoft Azure Blog","article_publisher":"https:\/\/www.facebook.com\/microsoftazure","article_published_time":"2025-01-08T16:00:00+00:00","article_modified_time":"2025-09-10T19:54:15+00:00","og_image":[{"width":1261,"height":708,"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.png","type":"image\/png"}],"author":"Olivia Shone","twitter_card":"summary_large_image","twitter_creator":"@azure","twitter_site":"@azure","twitter_misc":{"Written by":"Olivia Shone","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#article","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/"},"author":[{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/olivia-shone\/","@type":"Person","@name":"Olivia Shone"}],"headline":"Boost processing performance by combining AI models","datePublished":"2025-01-08T16:00:00+00:00","dateModified":"2025-09-10T19:54:15+00:00","mainEntityOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/"},"wordCount":1709,"commentCount":0,"publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.webp","keywords":["AI","Language models","Large language models (LLMs)","Small language models (SLMs)"],"articleSection":["AI + machine learning"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/","name":"Boost processing performance by combining AI models | Microsoft Azure Blog","isPartOf":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#primaryimage"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#primaryimage"},"thumbnailUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.webp","datePublished":"2025-01-08T16:00:00+00:00","dateModified":"2025-09-10T19:54:15+00:00","description":"Look at how a multiple model approach works and companies successfully implemented this approach to increase performance and reduce costs.","breadcrumb":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#primaryimage","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.webp","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2025\/01\/USE_Azure_479781_Blog_250107.webp","width":1261,"height":708,"caption":"Man looking at laptop"},{"@type":"BreadcrumbList","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/boost-processing-performance-by-combining-ai-models\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Blog home","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/"},{"@type":"ListItem","position":2,"name":"AI + machine learning","item":"https:\/\/azure.microsoft.com\/en-us\/blog\/category\/ai-machine-learning\/"},{"@type":"ListItem","position":3,"name":"Boost processing performance by combining AI models"}]},{"@type":"WebSite","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#website","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","name":"Microsoft Azure Blog","description":"Get the latest Azure news, updates, and announcements from the Azure blog. From product updates to hot topics, hear from the Azure experts.","publisher":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/azure.microsoft.com\/en-us\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#organization","name":"Microsoft Azure Blog","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","contentUrl":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-content\/uploads\/2024\/06\/microsoft_logo.webp","width":512,"height":512,"caption":"Microsoft Azure Blog"},"image":{"@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/microsoftazure","https:\/\/x.com\/azure","https:\/\/www.instagram.com\/microsoftdeveloper\/","https:\/\/www.linkedin.com\/company\/16188386","https:\/\/www.youtube.com\/user\/windowsazure"]},{"@type":"Person","@id":"https:\/\/azure.microsoft.com\/en-us\/blog\/#\/schema\/person\/dddfb06db704f28e44dc633b15e0d6ae","name":"Brianna McGovern","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/secure.gravatar.com\/avatar\/471211b4d059ccb73aa3fda768b31973fb946424996c0376f7f0be3cb919d469?s=96&d=mm&r=g5fc6a76f72449f78acaf535ec3e0c54f","url":"https:\/\/secure.gravatar.com\/avatar\/471211b4d059ccb73aa3fda768b31973fb946424996c0376f7f0be3cb919d469?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/471211b4d059ccb73aa3fda768b31973fb946424996c0376f7f0be3cb919d469?s=96&d=mm&r=g","caption":"Brianna McGovern"},"url":"https:\/\/azure.microsoft.com\/en-us\/blog\/author\/briannamcgovern\/"}]}},"msxcm_display_generated_audio":false,"msxcm_animated_featured_image":null,"distributor_meta":false,"distributor_terms":false,"distributor_media":false,"distributor_original_site_name":"Microsoft Azure Blog","distributor_original_site_url":"https:\/\/azure.microsoft.com\/en-us\/blog","push-errors":false,"_links":{"self":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/38030","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/users\/39"}],"replies":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/comments?post=38030"}],"version-history":[{"count":0,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/posts\/38030\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/media\/38066"}],"wp:attachment":[{"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/media?parent=38030"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/categories?post=38030"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tags?post=38030"},{"taxonomy":"audience","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/audience?post=38030"},{"taxonomy":"content-type","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/content-type?post=38030"},{"taxonomy":"product","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/product?post=38030"},{"taxonomy":"tech-community","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/tech-community?post=38030"},{"taxonomy":"topic","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/topic?post=38030"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/azure.microsoft.com\/en-us\/blog\/wp-json\/wp\/v2\/coauthors?post=38030"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}