Azure OpenAI Service pricing overview
To help customers in the journey, we offer pricing and cost management solutions to meet your needs. including:
- Standard (On-Demand): Pay-as-you-go for input and output tokens.
- Provisioned (PTUs): Allocate throughput with predictable costs, with monthly and annual reservations available to reduce overall spend.
- Batch API: Language models are also now available in the Batch API for global deployments and three regions, that returns completions within 24 hours for a 50% discount on Global Standard Pricing.
- Global Deployment – Global SKU
- Data Zone Deployment – Geographic based (EU or US)
- Regional Deployment – Local Region (up to 27 regions)
Explore las opciones de precios
Aplique filtros para adaptar las opciones de precios a sus necesidades.
Los precios son solo estimaciones y no están pensados como cotizaciones de precios reales. Los precios reales pueden variar en función del tipo de contrato especificado con Microsoft, la fecha de compra y el tipo de cambio de moneda. Los precios se calculan en dólares estadounidenses y se convierten con las tarifas al contado de cierre de Londres que se capturan en los dos días laborables anteriores al último día laborable del mes anterior. Si los dos días laborables anteriores al final del mes se encuentran en un día festivo en los principales mercados, el día de configuración de tarifas suele ser el día inmediatamente anterior a los dos días laborables. Esta tarifa se aplica a todas las transacciones durante el próximo mes. Inicie sesión en la calculadora de precios de Azure para ver los precios basados en su programa u oferta actuales con Microsoft. Póngase en contacto con un especialista en ventas de Azure para obtener más información sobre precios o para solicitar una oferta de precios. Vea preguntas poco frecuentes sobre los precios de Azure.
Las entidades gubernamentales de EE. UU. pueden comprar servicios de Azure Government a un proveedor de soluciones de licencia sin ningún compromiso financiero por adelantado, o bien directamente a través de una suscripción en línea de pago por uso.
Importante: el precio en R$ es solo una referencia. Se trata de transacciones internacionales y el precio final está sujeto a tipos de cambio y a la inclusión de impuestos sobre operaciones financieras. No se emitirá ningún eNF.
Las entidades gubernamentales de EE. UU. pueden comprar servicios de Azure Government a un proveedor de soluciones de licencia sin ningún compromiso financiero por adelantado, o bien directamente a través de una suscripción en línea de pago por uso.
Importante: el precio en R$ es solo una referencia. Se trata de transacciones internacionales y el precio final está sujeto a tipos de cambio y a la inclusión de impuestos sobre operaciones financieras. No se emitirá ningún eNF.
GPT-5.2
GPT-5.2 delivers the deep reasoning and expanded context handling necessary for building sophisticated AI agents capable of automating complex, long-running tasks across all business functions.
| Model | Pricing (1M Tokens) |
|---|---|
| GPT-5.2 Codex Global |
Input: $- Cached Input: $- Output: $- |
| GPT-5.2 Global |
Input: $- Cached Input: $- Output: $- |
| GPT-5.2 Data Zone |
Input: $- Cached Input: $- Output: $- |
| GPT-5.2-chat Global |
Input: $- Cached Input: $- Output: $- |
| GPT-5.2-chat Data Zone |
Input: $- Cached Input: $- Output: $- |
GPT-5.1
The GPT-5.1 series is built to respond faster to users in a variety of situations with adaptive reasoning, improving latency and cost efficiency across the series by varying thinking time more significantly. This, combined with other tooling improvements, enhanced stepwise reasoning visibility, multimodal intelligence, and enterprise-grade compliance.
| Model | Pricing (1M Tokens) |
|---|---|
| GPT-5.1 Global |
Input: $- Cached Input: $- Output: $- |
| GPT-5.1 Data Zone |
Input: $- Cached Input: $- Output: $- |
| GPT-5.1-chat Global |
Input: $- Cached Input: $- Output: $- |
| GPT-5.1-codex Global |
Input: $- Cached Input: $- Output: $- |
| GPT-5.1-codex-max Global |
Input: $- Cached Input: $- Output: $- |
| GPT-5.1-codex-mini Global |
Input: $- Cached Input: $- Output: $- |
GPT-5 series
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| GPT-5 2025-08-07 Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Cached Input: $- Output: $- |
| GPT-5 Data Zone |
Input: $- Cached Input: $- Output: $- |
Input: $- Cached Input: $- Output: $- |
| GPT-5 Pro Global |
Input: $- Output: $- |
N/A |
| GPT-5 Codex Global |
Input: $- Cached Input: $- Output: $- |
N/A |
| GPT-5-mini Global |
Input: $- Cached Input: $- Output: $- |
N/A |
| GPT-5-mini Data Zone |
Input: $- Cached Input: $- Output: $- |
N/A |
| GPT-5-nano Global |
Input: $- Cached Input: $- Output: $- |
N/A |
| GPT-5-nano Data Zone |
Input: $- Cached Input: $- Output: $- |
N/A |
| GPT-5 chat Global |
Input: $- Cached Input: $- Output: $- |
N/A |
Deep Research
Deep Research enables developers and enterprises to automate complex research tasks with structured, citation-rich answers. It is suitable for building customer support bots, internal knowledge assistants, or market analysis tools. Deep Research delivers transparent, auditable insights grounded in real-time web data. Search context tokens are charged input token prices for the model being used. You’ll separately incur charges for Grounding with Bing Search and the base GPT model being used for clarifying questions.
| Model | Pricing |
|---|---|
| o3-deep research Global |
Input: $- Cached Input: $- Output: $- |
o3
o3 is a powerful reasoning model from the o-series of reasoning models, pushing the frontier across coding, math, science, and visual perception. It excels in complex queries requiring multi-faceted analysis and performs strongly in visual tasks like analyzing images, charts, and graphics. The model features a 200K token context window and has a knowledge cutoff of June 2024.
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| o3 2025-04-16 Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| o3 2025-04-16 Data Zone |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| o3 2025-04-16 Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
o4-mini
o4-mini is a compact, efficient, and cost-effective reasoning model from OpenAI's o-series. It excels in math, coding, and visual tasks. The model features a 200K token context window and has a knowledge cutoff of June 2024.
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| o4-mini 2025-04-16 Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| o4-mini 2025-04-16 Data Zone |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| o4-mini 2025-04-16 Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
GPT-4.1 series
GPT-4.1 series is a highly advanced general-purpose model with extensive world knowledge and an enhanced ability to understand user intent, making it particularly adept at creative tasks and agentic planning. The series features a 1 million token context window and has a knowledge cutoff of June 2024.
| Model | Standard Pricing (1M Tokens) | Priority Processing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|---|
| GPT-4.1-2025-04-14 Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| GPT-4.1-2025-04-14 Data Zone |
Input: $- Cached Input: $- Output: $- |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| GPT-4.1-2025-04-14 Regional |
Input: $- Cached Input: $- Output: $- |
N/A | N/A |
| GPT-4.1-mini-2025-04-14 Global |
Input: $- Cached Input: $- Output: $- |
N/A |
Input: $- Output: $- |
| GPT-4.1-mini-2025-04-14 Data Zone |
Input: $- Cached Input: $- Output: $- |
N/A |
Input: $- Output: $- |
| GPT-4.1-mini-2025-04-14 Regional |
Input: $- Cached Input: $- Output: $- |
N/A | N/A |
| GPT-4.1-nano-2025-04-14 Global |
Input: $- Cached Input: $- Output: $- |
N/A |
Input: $- Output: $- |
| GPT-4.1-nano-2025-04-14 Data Zone |
Input: $- Cached Input: $- Output: $- |
N/A |
Input: $- Output: $- |
| GPT-4.1-nano-2025-04-14 Regional |
Input: $- Cached Input: $- Output: $- |
N/A | N/A |
Sora in Azure OpenAI
Sora is a multimodal generative AI model now available in Azure AI Foundry, designed to help creative teams bring ideas to life through seamless API-first integration. Built on Azure’s enterprise-grade infrastructure, it offers secure, scalable deployment for transforming concepts into high-quality visual content.
Sora 2
| Model | Size: Output Resolution | Price per second |
|---|---|---|
| Sora 2 Global |
Portrait: 720x1280 Landscape: 1280x720 |
$- |
Sora
| Price per second | 1-5s | 6-10s | 11-15s | 16-20s |
|---|---|---|---|---|
| 480 Square Global | $- | $- | $- | $- |
| 480p Global | $- | $- | $- | $- |
| 720 Square Global | $- | $- | $- | $- |
| 720p Global | $- | $- | $- | $- |
| 1080 Square Global | $- | $- | $- | $- |
| 1080p Global | $- | $- | $- | $- |
GPT-Image-1
GPT-image-1 enhances DALL·E with better instruction following, accurate text rendering, and support for image input and editing. The model is priced per token, with different pricing for text and image tokens.
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| GPT-Image-1.5 Global |
Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Text: $- Output Image: $- |
N/A |
| GPT-Image-1-mini Global |
Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Image: $- |
N/A |
| GPT-Image-1 Global |
Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Image: $- |
N/A |
| GPT-Image-1 Regional |
Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Image: $- |
N/A |
| GPT-Image-1 Data Zone |
Input Text: $- Input Image: $- Output Image: $- |
N/A |
GPT-4.5
GPT-4.5-preview is the latest general purpose model with deep world knowledge and better understanding of user intent that makes it good at creative tasks and agentic planning. The model has 128K context and an October 2023 knowledge cutoff.
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| GPT-4.5-Preview-2025-02-27 Global |
Input: $- Cached Input: $- Output: $- |
N/A |
o1
o1 is the new reasoning model series for complex tasks. The model has 200K context and an October 2023 knowledge cutoff.
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| o1 2024-12-17 Global |
Input: $- Cached Input: $- Output: $- |
N/A |
| o1 2024-12-17 US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
N/A |
| o1 2024-12-17 Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
| o1 preview 2024-09-12 Global |
Input: $- Cached Input: $- Output: $- |
N/A |
| o1 preview 2024-09-12 US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
N/A |
| o1 preview 2024-09-12 Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
Plan with the Pricing Calculator
o3 Mini
The o3 mini is the updated version of o1 mini model. o3-mini is a fast, cost-efficient reasoning model tailored to coding, math, and science use cases.
The o3-mini model now boasts an expanded context input window of 200K tokens and a maximum output of 100K tokens, providing ample space for complex and detailed responses. The o1 mini model has 128K context input. Both o3 and o1 models have a knowledge cutoff of October 2023.
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| o3 mini 2025-01-31 Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| o3 mini 2025-01-31-US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| o3 mini 2025-01-31 Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
| o1-mini 2024-09-12 Global |
Input: $- Cached Input: $- Output: $- |
N/A |
| o1-mini 2024-09-12 US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
N/A |
| o1-mini 2024-09-12 Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
Plan with the Pricing Calculator
Open Source Models
gpt-oss-120b offers a high-performing, open and controllable LLM—blending frontier reasoning skills with enterprise-grade flexibility and deployment autonomy.
| Model | Pricing (1M Tokens) |
|---|---|
| gpt-oss-120b |
Input: $- Output: $- |
Audio Models
GPT-realtime and GPT-audio models are now available via Azure AI Foundry and Azure OpenAI Service, enabling high-fidelity, low-latency voice interactions for production-grade applications. Additional audio models include GPT-4o, transcribe mini, and mini-tts, which deliver advanced speech-to-text and text-to-speech capabilities with emotionally expressive voices, customizable outputs, and superior accuracy—ideal for live customer call centers, real-time captioning, and interactive voice agents. The models leverage pretraining and distillation techniques to support natural turn-taking and stable APIs for multimodal deployments.
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| GPT-realtime |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- Image Input: $- Cached Input: $- |
N/A |
| GPT-realtime-mini-2025-12-15 Global |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- Image Input: $- Cached Input: $- |
N/A |
| GPT-audio |
Text Input: $- Output: $- Audio Input: $- Output: $- |
N/A |
| GPT-audio-mini-2025-12-15 Global |
Text Input: $- Output: $- Audio Input: $- Output: $- |
N/A |
| GPT-4o-Transcribe |
Text Input: $- Output: $- Audio Input: $- Output: N/A |
N/A |
| GPT-4o-transcribe-diarize |
Text Input: $- Output: $- Audio Input: $- Output: N/A |
N/A |
| GPT-4o-mini-transcribe-2025-12-15 |
Text Input: $- Output: $- Audio Input: $- Output: N/A |
N/A |
| GPT-4o-mini-TTS-2025-12-15 |
Text Input: $- Output: N/A Audio Input: N/A Output: $- |
N/A |
Computer-Using Agent (CUA)
The Computer-Using Agent (CUA) is a specialized AI model that allows AI to interact with graphical user interfaces (GUIs), navigate applications, and automate multi-step tasks—all through natural language instructions. The CUA model can be used as a tool in the Responses API.
| Model | Pricing |
|---|---|
| computer-use-preview Global |
Input: $-/1M tokens Output: $-/1M tokens |
Built-in tools
The Responses API and the Assistants API enable seamless interaction with tools like computer use, code interpreter, function calling, and file search, making it easy for developers to build AI agents.
| Tool | Input |
|---|---|
| Computer Use (Responses API only) |
Input: $-/1M tokens Output: $-/1M tokens |
| File Search Tool Call (Responses API only) | $-/1K tool calls |
| File Search* | $-/GB of vector-storage per day (1 GB free) |
| Code Interpreter** | $-/session |
*GB refers to binary gigabytes, where 1 gb is 2^30 bytes.
**If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that you would only pay this fee once if your user keeps giving instructions to Code Interpreter in the same thread for up to one hour.
Inference cost (input and output) varies based on the GPT model used with each Assistant. If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that the price is for up to one hour of giving instructions to Code Interpreter in the same thread.
Realtime API
Featured in the Realtime API, the GPT-4o-Realtime-Preview supports multilingual speech-to-speech capabilities. Optimized for real-time, low-latency conversations, it enables natural interactions with minimal delay, ideal for chatbots and conversational AI. GPT-4o is the comprehensive, more powerful version designed for complex tasks, while GPT-4o Mini is a smaller, more affordable option ideal for simpler applications where cost-efficiency and speed are priorities.
| Model | Pricing (1M Tokens) |
|---|---|
| GPT-4o-Realtime-Preview-2024-12-17-Global |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
| GPT-4o-Realtime-Preview-2024-12-17-US/EU – Data Zones |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
| GPT-4o-Realtime-Preview-2024-12-17-Regional |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
| GPT-4o-Mini-Realtime-Preview-2024-12-17-Global |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
| GPT-4o-Mini-Realtime-Preview-2024-12-17-US/EU – Data Zones |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
| GPT-4o-Mini-Realtime-Preview-2024-12-17-Regional |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
| GPT-4o-Realtime-Preview-2024-10-01-Global |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
| GPT-4o-Realtime-Preview-2024-10-01-US/EU – Data Zones |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
| GPT-4o-Realtime-Preview-2024-10-01-Regional |
Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- |
Chat Completions API
Featured in the Chat Completions API, the GPT 4o-Audio-Preview model processes and generates audio content. It supports advanced features like speech recognition and audio synthesis, ideal for asynchronous speech interactions and sentiment analysis. GPT-4o is the comprehensive, more powerful version designed for complex tasks, while GPT-4o Mini is a smaller, more affordable option ideal for simpler applications where cost-efficiency and speed are priorities.
| Model | Pricing (1M Tokens) |
|---|---|
| GPT-4o-Audio-Preview-2024-12-17-Global |
Text Input: $- Output: $- Audio Input: $- Output: $- |
| GPT-4o-Audio-Preview-2024-12-17-US/EU – Data Zones |
Text Input: $- Output: $- Audio Input: $- Output: $- |
| GPT-4o-Audio-Preview-2024-12-17-Regional |
Text Input: $- Output: $- Audio Input: $- Output: $- |
| GPT-4o-Mini-Audio-Preview-2024-12-17-Global |
Text Input: $- Output: $- Audio Input: $- Output: $- |
| GPT-4o-Mini-Audio-Preview-2024-12-17-US/EU – Data Zones |
Text Input: $- Output: $- Audio Input: $- Output: $- |
| GPT-4o-Mini-Audio-Preview-2024-12-17-Regional |
Text Input: $- Output: $- Audio Input: $- Output: $- |
GPT-4o
GPT-4o is the most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| GPT-4o-2024-1120 Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| GPT-4o-2024-1120 US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| GPT-4o-2024-1120 Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
| GPT-4o-2024-08-06 Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| GPT-4o-2024-08-06 US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| GPT-4o-2024-08-06 Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
| GPT-4o-2024-0513 Global |
Input: $- Output: $- |
Input: $- Output: $- |
| GPT-4o-2024-0513 US/EU – Data Zones |
Input: $- Output: $- |
N/A |
| GPT-4o-2024-0513 Regional |
Input: $- Output: $- |
N/A |
Plan with the Pricing Calculator
GPT-4o mini
GPT-4o mini is the most cost-efficient small model, and has vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.
| Model | Pricing (1M Tokens) | Pricing with Batch API (1M Tokens) |
|---|---|---|
| GPT-4o-mini-0718 Global |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| GPT-4o-mini-0718 US/EU – Data Zones |
Input: $- Cached Input: $- Output: $- |
Input: $- Output: $- |
| GPT-4o-mini-0718 Regional |
Input: $- Cached Input: $- Output: $- |
N/A |
Plan with the Pricing Calculator
Provisioned
You can allocate and manage throughput for deployments, ensuring predictable performance and stable capacity. You are charged an hourly rate per model regardless of usage, but you can also secure additional savings through monthly and annual reservations. Discover how to transition your regional deployments and provisioned reservations to global and data zones on this Learn page. To understand if your desired model is available in your specific region for provisioned pricing please visit this Learn page or contact your local sales rep for more details.
| Model | Min PTUs | PTU Hourly pricing | PTU Monthly Reservation Pricing | PTU Yearly Reservation Pricing |
|---|---|---|---|---|
| GPT-5.2 Global | 15 | $- | $- | $- |
| GPT-5.2 Data Zones | 15 | $- | $- | $- |
| GPT-5.2 Regional | 50 | $- | $- | $- |
| GPT-5.1 Codex Global | 15 | $- | $- | $- |
| GPT-5.1 Codex Data Zones | 15 | $- | $- | $- |
| GPT-5.1 Codex Regional | 50 | $- | $- | $- |
| GPT-5.1 Global | 15 | $- | $- | $- |
| GPT-5.1 Data Zones | 15 | $- | $- | $- |
| GPT-5.1 Regional | 50 | $- | $- | $- |
| GPT-5-mini Global | 15 | $- | $- | $- |
| GPT-5-mini Data Zones | 15 | $- | $- | $- |
| GPT-5-mini Regional | 50 | $- | $- | $- |
| GPT-5 Global | 15 | $- | $- | $- |
| GPT-5 Data Zones | 15 | $- | $- | $- |
| GPT-5 Regional | 50 | $- | $- | $- |
| GPT-4.1 Global | 15 | $- | $- | $- |
| GPT-4.1 Data Zones | 15 | $- | $- | $- |
| GPT-4.1 Regional | 50 | $- | $- | $- |
| GPT-4.1-mini Global | 15 | $- | $- | $- |
| GPT-4.1-mini US/EU Data Zones | 15 | $- | $- | $- |
| GPT-4.1-mini Regional | 25 | $- | $- | $- |
| GPT-4.1-nano Global | 15 | $- | $- | $- |
| GPT-4.1-nano US/EU Data Zones | 15 | $- | $- | $- |
| GPT-4.1-nano Regional | 25 | $- | $- | $- |
| o3-mini Global | 15 | $- | $- | $- |
| o3-mini US/EU Data Zones | 15 | $- | $- | $- |
| o3-mini Regional | 25 | $- | $- | $- |
| o3 Global | 15 | $- | $- | $- |
| o3 US/EU Data Zones | 15 | $- | $- | $- |
| o3 Regional | 50 | $- | $- | $- |
| o4-mini Global | 15 | $- | $- | $- |
| o4-mini US/EU Data Zones | 15 | $- | $- | $- |
| o4-mini Regional | 25 | $- | $- | $- |
| GPT-4o Global | 15 | $- | $- | $- |
| GPT-4o US/EU Data Zones | 15 | $- | $- | $- |
| GPT-4o Regional | 50 | $- | $- | $- |
| Fine-Tuned GPT-4o-Regional | 50 | $- | $- | $- |
| GPT-4o Mini Global | 15 | $- | $- | $- |
| GPT-4o Mini US/EU Data Zones | 15 | $- | $- | $- |
| GPT-4o Mini Regional | 25 | $- | $- | $- |
| Fine-Tuned GPT-4o-Mini Regional | 25 | $- | $- | $- |
Plan with the Pricing Calculator
Base models
| Models | Usage per 1,000 tokens |
|---|---|
| Babbage-002 | $- |
| Davinci-002 | $- |
Fine-tuning models
| Model | Pricing | |
|---|---|---|
| o4-mini (Reinforcement fine-tuning) | Regional |
Input: $-/1M tokens Output: $-/1M tokens Training: $-/hour Hosting: $-/hour Grader input: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens Grader cached input: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens Grader output: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens |
| Global |
Input: $-/1M tokens Output: $-/1M tokens Training: $-/hour Hosting: $-/hour Grader input: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens Grader cached input: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens Grader output: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens |
|
| Developer |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/hour |
|
| GPT-4.1 | Regional |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
| Global |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
|
| Developer |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens |
|
| GPT-4.1-mini | Regional |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
| Global |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
|
| Developer |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens |
|
| GPT-4.1-nano | Regional |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
| Global |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
|
| Developer |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens |
|
| GPT-4o-2024-08-06 | Regional |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
| Global |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
|
| Developer |
Training: $-/1M tokens |
|
| GPT-4o-mini | Regional |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
| Global |
Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
|
| Developer |
Training: $-/1M tokens |
|
| GPT-3.5-Turbo (16K) | Regional |
Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
| GPT OSS 20B | Regional |
Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour |
Image models
| Models | Quality | Resolution | Price (per 100 images) |
|---|---|---|---|
| Dall-E-3 | Standard | 1024 * 1024 | $- |
| Standard | 1024 * 1792, 1792 * 1024 |
$- | |
| Dall-E-3 | HD | 1024 * 1024 | $- |
| HD | 1024 * 1792, 1792 * 1024 |
$- | |
| Dall-E-2 | Standard | 1024 * 1024 | $- |
Embedding models
| Models | Per 1,000 tokens |
|---|---|
| Ada | $- |
| Ada DataZone | $- |
| text-embedding-3-large | $- |
| text-embedding-3-large DataZone | $- |
| text-embedding-3-small | $- |
| text-embedding-3-small DataZone | $- |
Speech Models
| Models | Price |
|---|---|
| Whisper | $-/hour |
| TTS (Text to Speech) | $-/1M characters |
| TTS HD | $-/1M characters |
Legacy Language Models
| Models | Context | Input (Per 1M Tokens) | Output (Per 1M Tokens) |
|---|---|---|---|
| GPT-3.5-Turbo-0301 | 4K | $- | $- |
| GPT-3.5-Turbo-0613 | 4K | $- | $- |
| GPT-3.5-Turbo-0613 | 16K | $- | $- |
| GPT-3.5-Turbo-1106 | 16K | $- | $- |
| GPT-3.5-Turbo-0125 | 16K | $- | $- |
| GPT-3.5-Turbo-Instruct | 4K | $- | $- |
| GPT-4-Turbo | 128K | $- | $- |
| GPT-4-Turbo-Vision | 128K | $- | $- |
| GPT-4 | 8K | $- | $- |
| GPT-4 | 32K | $- | $- |
Precios y opciones de compra de Azure
Póngase en contacto con nosotros directamente
Obtenga un tutorial sobre los precios de Azure. Averigüe cómo funcionan los precios para su solución en la nube, descubra cómo se pueden optimizar los costos y solicite una propuesta personalizada.
Hable con un especialista de ventasConozca las opciones de compra
Puede adquirir servicios de Azure en el sitio web de la plataforma y a través de un representante de Microsoft o de un asociado de Azure.
Explore sus opcionesRecursos adicionales
Servicio Azure OpenAI
Obtenga más información sobre las características y funcionalidades de Servicio Azure OpenAI.
Calculadora de precios
Haga una estimación de los costos mensuales que le supondría el uso de cualquier combinación de productos de Azure.
Contrato de nivel de servicio
Revise el Acuerdo de Nivel de Servicio de Servicio Azure OpenAI.
Documentación
Consulte tutoriales técnicos, vídeos y más recursos de Servicio Azure OpenAI.
Preguntas frecuentes
-
Azure OpenAI Service offers pricing based on both Pay-As-You-Go and Provisioned Throughput Units (PTUs). Pay-As-You-Go allows you to pay for the resources you consume, making it flexible for variable workloads. PTUs offers a predictable pricing model where you reserve and deploy a specific amount of model processing capacity. This model is ideal for workloads with consistent or predictable usage patterns, providing stability and cost control.
-
To learn more about PTUs and Azure OpenAI pricing please read PTU documentation or contact our sales specialist.
Hable con un especialista de ventas para que le explique en detalle los precios de Azure. Conozca el precio de su solución en la nube.
Obtenga servicios en la nube gratuitos y un crédito de $200 para explorar Azure durante 30 días.