Azure OpenAI Service

Azure OpenAI Service pricing overview

Azure OpenAI Service delivers enterprise-ready generative AI featuring powerful models from OpenAI, enabling organisations to innovate with text, audio and vision capabilities. Beyond the cutting-edge models, companies choose Azure OpenAI Service for built-in data privacy, regional/area/global flexibility, and seamless integration into the Azure ecosystem including Fabric, Cosmos DB and Azure AI Search. Companies of all sizes can confidently scale AI solutions to enhance customer experience, automate workflows, and unlock creative potential, driving measurable impact and competitive differentiation.

To help customers in the journey, we offer pricing and cost management solutions to meet your needs. including:

Standard (On-Demand): Pay-as-you-go for input and output tokens.
Provisioned (PTUs): Allocate throughput with predictable costs, with monthly and annual reservations available to reduce overall spend.
Batch API: Language models are also now available in the Batch API for global deployments and three regions, that returns completions within 24 hours for a 50% discount on Global Standard Pricing.

You can choose from the following deployment types for Standard and Provisioned, which enable greater flexibility and control of pricing and performance. This flexibility helps when there is increasingly more restrictive data processing boundaries and need for increased throughput and lower price.

Global Deployment – Global SKU
Data Zone Deployment – Geographic based (EU or US)
Regional Deployment – Local Region (up to 27 regions)

Explore pricing options

Apply filters to customise pricing options to your needs.

Prices are estimates only and are not intended as actual price quotes. Actual pricing may vary depending on the type of agreement entered with Microsoft, date of purchase, and the currency exchange rate. Prices are calculated based on US dollars and converted using London closing spot rates that are captured in the two business days prior to the last business day of the previous month end. If the two business days prior to the end of the month autumn on a bank holiday in major markets, the rate setting day is generally the day immediately preceding the two business days. This rate applies to all transactions during the forthcoming month. Sign in to the Azure pricing calculator to see pricing based on your current programme/offer with Microsoft. Contact an Azure sales specialist for more information on pricing or to request a price quote. See frequently asked questions about Azure pricing.

Region:

Currency:

o1

o1 is the new reasoning model series for complex tasks. The model has 200K context and an October 2023 knowledge cut-off.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
o1 2024-12-17 Global	Input: $- Cached Input: $- Output: $-	N/A
o1 2024-12-17 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	N/A
o1 2024-12-17 Regional	Input: $- Cached Input: $- Output: $-	N/A
o1 preview 2024-09-12 Global	Input: $- Cached Input: $- Output: $-	N/A
o1 preview 2024-09-12 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	N/A
o1 preview 2024-09-12 Regional	Input: $- Cached Input: $- Output: $-	N/A

Plan with the Pricing Calculator

o3 Mini

The o3 mini is the updated version of o1 mini model. o3-mini is a fast, cost-efficient reasoning model tailored to coding, math, and science use cases.

The o3-mini model now boasts an expanded context input window of 200K tokens and a maximum output of 100K tokens, providing ample space for complex and detailed responses. The o1 mini model has 128K context input. Both o3 and o1 models have a knowledge cutoff of October 2023.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
o3 mini 2025-01-31 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
o3 mini 2025-01-31-US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
o3 mini 2025-01-31 Regional	Input: $- Cached Input: $- Output: $-	N/A
o1-mini 2024-09-12 Global	Input: $- Cached Input: $- Output: $-	N/A
o1-mini 2024-09-12 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	N/A
o1-mini 2024-09-12 Regional	Input: $- Cached Input: $- Output: $-	N/A

Plan with the Pricing Calculator

Realtime API

Featured in the Realtime API, the GPT-4o-Realtime-Preview supports multilingual speech-to-speech capabilities. Optimized for real-time, low-latency conversations, it enables natural interactions with minimal delay, ideal for chatbots and conversational AI. GPT-4o is the comprehensive, more powerful version designed for complex tasks, while GPT-4o Mini is a smaller, more affordable option ideal for simpler applications where cost-efficiency and speed are priorities.

Model	Pricing (1M Tokens)
GPT-4o-Realtime-Preview-2024-12-17-Global	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-12-17-US/EU – Data Zones	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-12-17-Regional	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Mini-Realtime-Preview-2024-12-17-Global	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Mini-Realtime-Preview-2024-12-17-US/EU – Data Zones	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Mini-Realtime-Preview-2024-12-17-Regional	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-10-01-Global	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-10-01-US/EU – Data Zones	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-10-01-Regional	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-

Chat Completions API

Featured in the Chat Completions API, the GPT 4o-Audio-Preview model processes and generates audio content. It supports advanced features like speech recognition and audio synthesis, ideal for asynchronous speech interactions and sentiment analysis. GPT-4o is the comprehensive, more powerful version designed for complex tasks, while GPT-4o Mini is a smaller, more affordable option ideal for simpler applications where cost-efficiency and speed are priorities.

Model	Pricing (1M Tokens)
GPT-4o-Audio-Preview-2024-12-17-Global	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Audio-Preview-2024-12-17-US/EU – Data Zones	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Audio-Preview-2024-12-17-Regional	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Mini-Audio-Preview-2024-12-17-Global	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Mini-Audio-Preview-2024-12-17-US/EU – Data Zones	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Mini-Audio-Preview-2024-12-17-Regional	Text Input: $- Output: $- Audio Input: $- Output: $-

GPT-4o

GPT-4o is the most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-4o-2024-1120 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-1120 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-1120 Regional	Input: $- Cached Input: $- Output: $-	N/A
GPT-4o-2024-08-06 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-08-06 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-08-06 Regional	Input: $- Cached Input: $- Output: $-	N/A
GPT-4o-2024-0513 Global	Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-0513 US/EU – Data Zones	Input: $- Output: $-	N/A
GPT-4o-2024-0513 Regional	Input: $- Output: $-	N/A

Plan with the Pricing Calculator

GPT-4o mini

GPT-4o mini is the most cost-efficient small model, and has vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-4o-mini-0718 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-mini-0718 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-mini-0718 Regional	Input: $- Cached Input: $- Output: $-	N/A

Plan with the Pricing Calculator

Provisioned

You can allocate and manage throughput for deployments, ensuring predictable performance and stable capacity. You are charged an hourly rate per model regardless of usage, but you can also secure additional savings through monthly and annual reservations. Discover how to transition your regional deployments and provisioned reservations to global and data zones on this Learn page.

Model	Min PTUs	PTU Hourly pricing	PTU Monthly Reservation Pricing	PTU Yearly Reservation Pricing
GPT-4o Global	15	$-	$-	$-
GPT-4o US/EU Data Zones	15	$-	$-	$-
GPT-4o Regional	50	$-	$-	$-
Fine-Tuned GPT-4o-Regional	50	$-	$-	$-
GPT-4o Mini Global	15	$-	$-	$-
GPT-4o Mini US/EU Data Zones	15	$-	$-	$-
GPT-4o Mini Regional	25	$-	$-	$-
Fine-Tuned GPT-4o-Mini Regional	25	$-	$-	$-

Plan with the Pricing Calculator

Base models

Models	Usage per 1,000 tokens
Babbage-002	$-
Davinci-002	$-

Fine-tuning models

Model		Pricing
GPT-4o-2024-08-06	Regional	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
GPT-4o-2024-08-06	Global	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: use regional Hosting: $-/hour
GPT-4o-mini	Regional	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
GPT-4o-mini	Global	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: use regional Hosting: $-/hour
GPT-4-0613 (8K)	Regional	Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
GPT-3.5-Turbo (16K)	Regional	Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
GPT-3.5-Turbo (4K)	Regional	Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
Babbage-002		Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
Davinci-002		Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour

Assistants API

The Assistants API and its tools make it easy for developers to build AI Assistants in their applications.

The tokens used for the Assistants API are billed at the chosen language model's per token input/output rates used with each Assistant. Additionally, we charge the following fees for tool usage:

Tool	Input
File Search^*	$-/GB of vector-storage per day (1 GB free)
Code Interpreter^**	$-/session

^*GB refers to binary gigabytes, where 1 gb is 2^30 bytes.

^**If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that you would only pay this fee once if your user keeps giving instructions to Code Interpreter in the same thread for up to one hour.

Inference cost (input and output) varies based on the GPT model used with each Assistant. If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that the price is for up to one hour of giving instructions to Code Interpreter in the same thread.

Image models

Models	Quality	Resolution	Price (per 100 images)
Dall-E-3	Standard	1024 * 1024	$-
	Standard	1024 * 1792, 1792 * 1024	$-
Dall-E-3	HD	1024 * 1024	$-
	HD	1024 * 1792, 1792 * 1024	$-
Dall-E-2	Standard	1024 * 1024	$-

Embedding models

Models	Per 1,000 tokens
Ada	$-
text-embedding-3-large	$-
text-embedding-3-small	$-

Speech Models

Models	Price
Models	Whisper	$-/hour
TTS (Text to Speech)	$-/1M characters
TTS HD	$-/1M characters

Legacy Language Models

Models	Context	Input (Per 1M Tokens)	Output (Per 1M Tokens)
GPT-3.5-Turbo-0301	4K	$-	$-
GPT-3.5-Turbo-0613	4K	$-	$-
GPT-3.5-Turbo-0613	16K	$-	$-
GPT-3.5-Turbo-1106	16K	$-	$-
GPT-3.5-Turbo-0125	16K	$-	$-
GPT-3.5-Turbo-Instruct	4K	$-	$-
GPT-4-Turbo	128K	$-	$-
GPT-4-Turbo-Vision	128K	$-	$-
GPT-4	8K	$-	$-
GPT-4	32K	$-	$-

Azure pricing and purchasing options

Connect with us directly

Get a walkthrough of Azure pricing. Understand pricing for your cloud solution, learn about cost optimisation and request a customised proposal.

Talk to a sales specialist

See ways to purchase

Purchase Azure services through the Azure website, a Microsoft representative or an Azure partner.

Explore your options

Additional resources

Frequently asked questions

Frequently asked questions about Azure pricing

Azure OpenAI Service offers pricing based on both Pay-As-You-Go and Provisioned Throughput Units (PTUs). Pay-As-You-Go allows you to pay for the resources you consume, making it flexible for variable workloads. PTUs offers a predictable pricing model where you reserve and deploy a specific amount of model processing capacity. This model is ideal for workloads with consistent or predictable usage patterns, providing stability and cost control.
Azure Products by Region | Microsoft Azure
SLA for Azure AI Services | Microsoft Azure
To learn more about PTUs and Azure OpenAI pricing please read PTU documentation or contact our sales specialist.

Talk to a sales specialist for a walk-through of Azure pricing. Understand pricing for your cloud solution.

Request a pricing quote

Get free cloud services and a $200 credit to explore Azure for 30 days.

Try Azure for free

Azure OpenAI Service pricing