Azure OpenAI Service pricing overview
To help customers in the journey, we offer pricing and cost management solutions to meet your needs. including:
- Standard (On-Demand): Pay-as-you-go for input and output tokens.
- Provisioned (PTUs): Allocate throughput with predictable costs, with monthly and annual reservations available to reduce overall spend.
- Batch API: Language models are also now available in the Batch API for global deployments and three regions, that returns completions within 24 hours for a 50% discount on Global Standard Pricing.
- Global Deployment – Global SKU
- Data Zone Deployment – Geographic based (EU or US)
- Regional Deployment – Local Region (up to 27 regions)
Explore pricing options
Apply filters to customize pricing options to your needs.
Prices are estimates only and are not intended as actual price quotes. Actual pricing may vary depending on the type of agreement entered with Microsoft, date of purchase, and the currency exchange rate. Prices are calculated based on US dollars and converted using London closing spot rates that are captured in the two business days prior to the last business day of the previous month end. If the two business days prior to the end of the month fall on a bank holiday in major markets, the rate setting day is generally the day immediately preceding the two business days. This rate applies to all transactions during the upcoming month. Sign in to the Azure pricing calculator to see pricing based on your current program/offer with Microsoft. Contact an Azure sales specialist for more information on pricing or to request a price quote. See frequently asked questions about Azure pricing.
US government entities are eligible to purchase Azure Government services from a licensing solution provider with no upfront financial commitment, or directly through a pay-as-you-go online subscription.
Important—The price in R$ is merely a reference; this is an international transaction and the final price is subject to exchange rates and the inclusion of IOF taxes. An eNF will not be issued.
US government entities are eligible to purchase Azure Government services from a licensing solution provider with no upfront financial commitment, or directly through a pay-as-you-go online subscription.
Important—The price in R$ is merely a reference; this is an international transaction and the final price is subject to exchange rates and the inclusion of IOF taxes. An eNF will not be issued.
o1
o1 is the new reasoning model series for complex tasks. The model has 200K context and an October 2023 knowledge cutoff.
Model | Pricing (1M Tokens) |
---|---|
Global |
Input: $- Cached Input: $- Output: $- |
US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
Regional |
Input: $- Cached Input: $- Output: $- |
Plan with the Pricing Calculator
o1 Mini
o1-mini is a fast, cost-efficient reasoning model tailored to coding, math, and science use cases. The model has 128K context and an October 2023 knowledge cutoff.
Model | Pricing (1M Tokens) |
---|---|
Global |
Input: $- Cached Input: $- Output: $- |
US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
Regional |
Input: $- Cached Input: $- Output: $- |
Plan with the Pricing Calculator
Realtime API
Featured in the Realtime API, the GPT-4o-Realtime-Preview supports multilingual speech-to-speech capabilities. Optimized for real-time, low-latency conversations, it enables natural interactions with minimal delay, ideal for chatbots and conversational AI.
Model | Pricing (1M Tokens) |
---|---|
GPT-4o-Realtime-Preview-Global |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
GPT-4o-Realtime-Preview-US/EU – Data Zones |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
GPT-4o-Realtime-Preview-Regional |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
Chat Completions API - Coming soon
Featured in the Chat Completions API, the GPT 4o-Audio-Preview model processes and generates audio content. It supports advanced features like speech recognition and audio synthesis, ideal for asynchronous speech interactions and sentiment analysis. Cached input is coming soon.
Model | Pricing (1M Tokens) |
---|---|
GPT-4o-Audio-Preview-Global |
Text Input: $- Cached Input: N/A Output: $- Audio Input: $- Cached Input: N/A Output: $- |
GPT-4o
GPT-4o is the most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.
Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
---|---|---|
GPT-4o-2024-08-06 Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
N/A |
Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
GPT-4o-0513 Global |
Input: $- Output: $- |
Input: $- Output: $- |
US/EU – Data Zones |
Input: $- Output: $- |
N/A |
Regional |
Input: $- Output: $- |
N/A |
Plan with the Pricing Calculator
GPT-4o mini
GPT-4o mini is the most cost-efficient small model, and has vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.
Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
---|---|---|
Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
N/A |
Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
Plan with the Pricing Calculator
Provisioned
You can allocate and manage throughput for deployments, ensuring predictable performance and stable capacity. You are charged an hourly rate per model regardless of usage, but you can also secure additional savings through monthly and annual reservations. Discover how to transition your regional deployments and provisioned reservations to global and data zones on this Learn page.
Model | Min PTUs | PTU Hourly pricing | PTU Monthly Reservation Pricing | PTU Yearly Reservation Pricing |
---|---|---|---|---|
GPT-4o Global | 15 | $- | $- | $- |
GPT-4o US/EU Data Zones | 15 | $- | $- | $- |
GPT-4o Regional | 50 | $- | $- | $- |
GPT-4o Mini Global | 15 | $- | $- | $- |
GPT-4o Mini US/EU Data Zones | 15 | $- | $- | $- |
GPT-4o Mini Regional | 25 | $- | $- | $- |
Plan with the Pricing Calculator
Base models
Models | Usage per 1,000 tokens |
---|---|
Babbage-002 | $- |
Davinci-002 | $- |
Fine-tuning models
Models | Training per 1,000 tokens | Hosting per hour | Input Usage per 1,000 tokens | Output Usage per 1,000 tokens | Cached Input per 1,000 tokens |
---|---|---|---|---|---|
Babbage-002 | $- | $- | $- | $- | N/A |
Davinci-002 | $- | $- | $- | $- | N/A |
GPT-3.5-Turbo (4K) | $- | $- | $- | $- | N/A |
GPT-3.5-Turbo (16K) | $- | $- | $- | $- | N/A |
GPT-4 (8K) | $- | $- | $- | $- | N/A |
GPT-4o | $- | $- | $- | $- | N/A |
GPT-4o-mini | $- | $- | $- | $- | $- |
GPT-4o-0806 | N/A | N/A | N/A | N/A | $- |
Assistants API
The Assistants API and its tools make it easy for developers to build AI Assistants in their applications.
The tokens used for the Assistants API are billed at the chosen language model's per token input/output rates used with each Assistant. Additionally, we charge the following fees for tool usage:
Tool | Input |
---|---|
File Search* | $-/GB of vector-storage per day (1 GB free) |
Code Interpreter** | $-/session |
*GB refers to binary gigabytes, where 1 gb is 2^30 bytes.
**If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that you would only pay this fee once if your user keeps giving instructions to Code Interpreter in the same thread for up to one hour.
Inference cost (input and output) varies based on the GPT model used with each Assistant. If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that the price is for up to one hour of giving instructions to Code Interpreter in the same thread.
Image models
Models | Quality | Resolution | Price (per 100 images) |
---|---|---|---|
Dall-E-3 | Standard | 1024 * 1024 | $- |
Standard | 1024 * 1792, 1792 * 1024 |
$- | |
Dall-E-3 | HD | 1024 * 1024 | $- |
HD | 1024 * 1792, 1792 * 1024 |
$- | |
Dall-E-2 | Standard | 1024 * 1024 | $- |
Embedding models
Models | Per 1,000 tokens |
---|---|
Ada | $- |
text-embedding-3-large | $- |
text-embedding-3-small | $- |
Speech Models
Models | Price |
---|---|
Whisper | $-/hour |
TTS (Text to Speech) | $-/1M characters |
TTS HD | $-/1M characters |
Legacy Language Models
Models | Context | Input (Per 1M Tokens) | Output (Per 1M Tokens) |
---|---|---|---|
GPT-3.5-Turbo-0301 | 4K | $- | $- |
GPT-3.5-Turbo-0613 | 4K | $- | $- |
GPT-3.5-Turbo-0613 | 16K | $- | $- |
GPT-3.5-Turbo-1106 | 16K | $- | $- |
GPT-3.5-Turbo-0125 | 16K | $- | $- |
GPT-3.5-Turbo-Instruct | 4K | $- | $- |
GPT-4-Turbo | 128K | $- | $- |
GPT-4-Turbo-Vision | 128K | $- | $- |
GPT-4 | 8K | $- | $- |
GPT-4 | 32K | $- | $- |
Azure pricing and purchasing options
Connect with us directly
Get a walkthrough of Azure pricing. Understand pricing for your cloud solution, learn about cost optimization and request a custom proposal.
Talk to a sales specialistSee ways to purchase
Purchase Azure services through the Azure website, a Microsoft representative, or an Azure partner.
Explore your optionsAdditional resources
Azure OpenAI Service
Learn more about Azure OpenAI Service features and capabilities.
Pricing calculator
Estimate your expected monthly costs for using any combination of Azure products.
SLA
Review the Service Level Agreement for Azure OpenAI Service.
Documentation
Review technical tutorials, videos, and more Azure OpenAI Service resources.
Frequently asked questions
-
Azure OpenAI Service offers pricing based on both Pay-As-You-Go and Provisioned Throughput Units (PTUs). Pay-As-You-Go allows you to pay for the resources you consume, making it flexible for variable workloads. PTUs offers a predictable pricing model where you reserve and deploy a specific amount of model processing capacity. This model is ideal for workloads with consistent or predictable usage patterns, providing stability and cost control.
-
To learn more about PTUs and Azure Open AI pricing please read PTU documentation or contact our sales specialist
Talk to a sales specialist for a walk-through of Azure pricing. Understand pricing for your cloud solution.
Get free cloud services and a $200 credit to explore Azure for 30 days.