Over 60,000 customers including AT&T, H&R Block, Volvo, Grammarly, Harvey, Leya, and more leverage Microsoft Azure AI to drive AI transformation. We are excited to see the growing adoption of AI across industries and businesses small and large. This blog summarizes new capabilities across Azure AI portfolio that provide greater choice and flexibility to build and scale AI solutions. Key updates include:
- Availability of Azure OpenAI Data Zones for the United States and European Union that offer greater flexibility in deployment options.
- 99% SLA on token generation, general availability of Azure OpenAI Service Batch API, availability of Prompt Caching, 50% reduction in price for models through Provisioned Global, and lower deployment minimums on Provisioned Global GPT-4o models to scale efficiently and to optimize costs.
- New models including Healthcare industry models, Ministral 3B small model from Mistral and Cohere Embed 3, and fine-tuning general availability for Phi 3.5 family providing greater choice and customization.
- Upgrade from GitHub Models to Azure AI model inference API and availability of AI App templates to accelerate AI development.
- New enterprise ready features to build with AI safely.
Azure OpenAI Data Zones for the United States and European Union
We are thrilled to announce Azure OpenAI Data Zones, a new deployment option that provides enterprises with even more flexibility and control over their data privacy and residency needs. Tailored for organizations in the United States and European Union, Data Zones allow customers to process and store their data within specific geographic boundaries, ensuring compliance with regional data residency requirements while maintaining optimal performance. By spanning multiple regions within these areas, Data Zones offer a balance between the cost-efficiency of global deployments and the control of regional deployments, making it easier for enterprises to manage their AI applications without sacrificing security or speed.
This new feature simplifies the often-complex task of managing data residency by offering a solution that allows for higher throughput and faster access to the latest AI models, including newest innovation from Azure OpenAI Service. Enterprises can now take advantage of Azure’s robust infrastructure to securely scale their AI solutions while meeting stringent data residency requirements. Data Zones is available for Standard (PayGo) and coming soon to Provisioned.
Azure OpenAI Service updates
Earlier this month, we announced general availability of Azure OpenAI Batch API for Global deployments. With Azure OpenAI Batch API, developers can manage large-scale and high-volume processing tasks more efficiently with separate quota, a 24-hour turnaround time, at 50% less cost than Standard Global. Ontada, an entity within McKesson, is already leveraging Batch API to process large volume of patient data across oncology centers in the United States efficiently and cost effectively.
”Ontada is at the unique position of serving providers, patients and life science partners with data-driven insights. We leverage the Azure OpenAI Batch API to process tens of millions of unstructured documents efficiently, enhancing our ability to extract valuable clinical information. What would have taken months to process now takes just a week. This significantly improves evidence-based medicine practice and accelerates life science product R&D. Partnering with Microsoft, we are advancing AI-driven oncology research, aiming for breakthroughs in personalized cancer care and drug development.” — Sagran Moodley, Chief Innovation and Technology Officer, Ontada
We have also enabled Prompt Caching for o1-preview, o1-mini, GPT-4o, and GPT-4o-mini models on Azure OpenAI Service. With Prompt Caching developers can optimize costs and latency by reusing recently seen input tokens. This feature is particularly useful for applications that use the same context repeatedly such as code editing or long conversations with chatbots. Prompt Caching offers a 50% discount on cached input tokens on Standard offering and faster processing times.
For Provisioned Global deployment offering, we are lowering the initial deployment quantity for GPT-4o models to 15 Provisioned Throughput Unit (PTUs) with additional increments of 5 PTUs. We are also lowering the price for Provisioned Global Hourly by 50% to broaden access to Azure OpenAI Service. Learn more here about managing costs for AI deployments.
In addition, we’re introducing a 99% latency service level agreement (SLA) for token generation. This latency SLA ensures that tokens are generated at faster and more consistent speeds, especially at high volumes.
New models and customization
We continue to expand model choice with the addition of new models to the model catalog. We have several new models available this month, including Healthcare industry models and models from Mistral and Cohere. We are also announcing customization capabilities for Phi-3.5 family of models.
- Healthcare industry models, comprising of advanced multimodal medical imaging models including MedImageInsight for image analysis, MedImageParse for image segmentation across imaging modalities, and CXRReportGen that can generate detailed structured reports. Developed in collaboration with Microsoft Research and industry partners, these models are designed to be fine-tuned and customized by healthcare organizations to meet specific needs, reducing the computational and data requirements typically needed for building such models from scratch. Explore today in Azure AI model catalog.
- Ministral 3B from Mistral AI: Ministral 3B represents a significant advancement in the sub-10B category, focusing on knowledge, commonsense reasoning, function-calling, and efficiency. With support for up to 128k context length, these models are tailored for a diverse array of applications—from orchestrating agentic workflows to developing specialized task workers. When used alongside larger language models like Mistral Large, Ministral 3B can serve as efficient intermediary for function-calling in multi-step agentic workflows.
- Cohere Embed 3: Embed 3, Cohere’s industry-leading AI search model, is now available in the Azure AI Model Catalog—and it’s multimodal! With the ability to generate embeddings from both text and images, Embed 3 unlocks significant value for enterprises by allowing them to search and analyze their vast amounts of data, no matter the format. This upgrade positions Embed 3 as the most powerful and capable multimodal embedding model on the market, transforming how businesses search through complex assets like reports, product catalogs, and design files.
- Fine-tuning general availability for Phi 3.5 family, including Phi-3.5-mini and Phi-3.5-MoE. Phi family models are well suited for customization to improve base model performance across a variety of scenarios including learning a new skill or a task or enhancing consistency and quality of the response. Given their small compute footprint as well as cloud and edge compatibility, Phi-3.5 models offer a cost effective and sustainable alternative when compared to models of the same size or next size up. We’re already seeing adoption of Phi-3.5 family for use cases including edge reasoning as well as non-connected scenarios. Developers can fine-tune Phi-3.5-mini and Phi-3.5-MoE today through model as a platform offering and using serverless endpoint.
AI app development
We are building Azure AI to be an open, modular platform, so developers can go from idea to code to cloud quickly. Developers can now explore and access Azure AI models directly through GitHub Marketplace through Azure AI model inference API. Developers can try different models and compare model performance in the playground for free (usage limits apply) and when ready to customize and deploy, developers can seamlessly setup and login to their Azure account to scale from free token usage to paid endpoints with enterprise-level security and monitoring without changing anything else in the code.
We also announced AI App Templates to speed up AI app development. Developers can use these templates in GitHub Codespaces, VS Code, and Visual Studio. The templates offer flexibility with various models, frameworks, languages, and solutions from providers like Arize, LangChain, LlamaIndex, and Pinecone. Developers can deploy full apps or start with components, provisioning resources across Azure and partner services.
Our mission is to empower all developers across the globe to build with AI. With these updates, developers can quickly get started in their preferred environment, choose the deployment option that best fits the need and scale AI solutions with confidence.
New features to build secure, enterprise-ready AI apps
At Microsoft, we’re focused on helping customers use and build AI that is trustworthy, meaning AI that is secure, safe, and private. Today, I am excited to share two new capabilities to build and scale AI solutions confidently.
The Azure AI model catalog offers over 1,700 models for developers to explore, evaluate, customize, and deploy. While this vast selection empowers innovation and flexibility, it can also present significant challenges for enterprises that want to ensure all deployed models align with their internal policies, security standards, and compliance requirements. Now, Azure AI administrators can use Azure policies to pre-approve select models for deployment from the Azure AI model catalog, simplifying model selection and governance processes. This includes pre-built policies for Models-as-a-Service (MaaS) and Models-as-a-Platform (MaaP) deployments, while a detailed guide facilitates the creation of custom policies for Azure OpenAI Service and other AI services. Together, these policies provide complete coverage for creating an allowed model list and enforcing it across Azure Machine Learning and Azure AI Studio.
To customize models and applications, developers may need access to resources located on-premises, or even resources not supported with private endpoints but still located in their custom Azure virtual network (VNET). Application Gateway is a load balancer that makes routing decisions based on the URL of an HTTPS request. Application Gateway will support a private connection from the managed VNET to any resources using HTTP or HTTPs protocol. Today, it is verified to support a private connection to Jfrog Artifactory, Snowflake Database, and Private APIs. With Application Gateway in Azure Machine Learning and Azure AI Studio, now available in public preview, developers can access on-premises or custom VNET resources for their training, fine-tuning, and inferencing scenarios without compromising their security posture.
Start today with Azure AI
It has been an incredible six months being here at Azure AI, delivering state-of-the-art AI innovation, seeing developers build transformative experiences using our tools, and learning from our customers and partners. I am excited for what comes next. Join us at Microsoft Ignite 2024 to hear about the latest from Azure AI.
Additional resources:
- Get started with Azure OpenAI Service.
- Get started with fine-tuning with Phi-3.
- Learn more about Trustworthy AI.