홈
Azure 가격
Azure OpenAI Service 가격

Azure OpenAI Service 가격

Azure OpenAI Service pricing overview

Azure OpenAI Service delivers enterprise-ready generative AI featuring powerful models from OpenAI, enabling organizations to innovate with text, audio, and vision capabilities. Beyond the cutting-edge models, companies choose Azure OpenAI Service for built-in data privacy, regional/area/global flexibility, and seamless integration into the Azure ecosystem including Fabric, Cosmos DB and Azure AI Search. Companies of all sizes can confidently scale AI solutions to enhance customer experience, automate workflows, and unlock creative potential, driving measurable impact and competitive differentiation.

To help customers in the journey, we offer pricing and cost management solutions to meet your needs. including:

Standard (On-Demand): Pay-as-you-go for input and output tokens.
Provisioned (PTUs): Allocate throughput with predictable costs, with monthly and annual reservations available to reduce overall spend.
Batch API: Language models are also now available in the Batch API for global deployments and three regions, that returns completions within 24 hours for a 50% discount on Global Standard Pricing.

You can choose from the following deployment types for Standard and Provisioned, which enable greater flexibility and control of pricing and performance. This flexibility helps when there is increasingly more restrictive data processing boundaries and need for increased throughput and lower price.

Global Deployment – Global SKU
Data Zone Deployment – Geographic based (EU or US)
Regional Deployment – Local Region (up to 27 regions)

가격 옵션 살펴보기

필터를 적용하여 요구 사항에 맞게 가격 옵션을 사용자 지정하세요.

가격은 예상값일 뿐이며 실제 가격 견적이 아닙니다. 실제 가격 책정은 Microsoft와 체결한 계약 유형, 구매 날짜 및 환율에 따라 다를 수 있습니다. 가격은 미국 달러를 기준으로 계산되며 전월 마지막 영업일 이전 영업일 기준 2일 동안 수집된 런던 종가 현물 환율을 사용하여 변환됩니다. 월말 전 영업일 기준 2일이 주요 시장의 공휴일인 경우에는 통상 영업일 기준 2일의 직전일을 기준금리 설정일로 합니다. 이 비율은 다가오는 달의 모든 거래에 적용됩니다. Azure 가격 책정 계산기에 로그인하여 Microsoft의 현재 프로그램/제안을 기반으로 가격 책정을 확인하세요. 가격 책정에 대한 자세한 내용을 알아보거나 가격 견적을 요청하려면 Azure 영업 전문가에게 문의하세요. Azure 가격 책정에 대한 자주 묻는 질문을 참조하세요.

지역:

통화:

미국 정부 기관은 Azure Government 서비스를 종량제 온라인 구독을 통해 직접 구입하거나 라이선스 솔루션 공급자를 통해 선불 약정 없이 구입할 수 없습니다.

자세한 정보

중요: R$로 표시된 가격은 단순 참조용입니다. 국제 거래이기 때문에 최종 가격은 환율과 IOF 세금 포함 여부에 따라 달라집니다. eNF는 발행되지 않습니다.

자세한 정보

GPT-Chat Latest

Model	Pricing (1M Tokens)
GPT-Chat Latest 05052026 Global	Input: $- Cached Input: $- Output: $-

GPT 5.5 Series

GPT-5.5 delivers advanced reasoning, instruction following, and agentic capabilities for production AI workloads.

Model	Pricing (1M Tokens)	Priority Processing (1M Tokens)
GPT-5.5 Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5.5 Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5.5 Long Context Global	Input: $- Cached Input: $- Output: $-	N/A
GPT-5.5 Long Context Data Zone	Input: $- Cached Input: $- Output: $-	N/A

GPT 5.4 Series

Built for Reliable AI Production: Stronger reasoning, dependable execution, and agentic workflows at scale.

Model	Pricing (1M Tokens)	Priority Processing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-5.4 (<272k context length) Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5.4 (<272k context length) Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	N/A
GPT-5.4 (>272k context length) Global	Input: $- Cached Input: $- Output: $-	N/A	N/A
GPT-5.4 Pro (<272k context length) Global	Input: $- Output: $-	N/A	N/A
GPT-5.4 Pro (>272k context length) Global	Input: $- Output: $-	N/A	N/A
GPT-5.4 mini Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	N/A
GPT-5.4 nano Global	Input: $- Cached Input: $- Output: $-	N/A	N/A

GPT-5.3 Series

Unifies the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2.

Model	Pricing (1M Tokens)	Priority Processing (1M Tokens)
GPT-5.3 Codex Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5.3 Chat Global	Input: $- Cached Input: $- Output: $-	N/A

GPT-5.2

GPT-5.2 delivers the deep reasoning and expanded context handling necessary for building sophisticated AI agents capable of automating complex, long-running tasks across all business functions.

Model	Pricing (1M Tokens)	Priority Processing (1M Tokens)
GPT-5.2 Codex Global	Input: $- Cached Input: $- Output: $-	N/A
GPT-5.2 Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5.2 Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5.2-chat latest Global	Input: $- Cached Input: $- Output: $-	N/A
GPT-5.2-chat Data Zone	Input: $- Cached Input: $- Output: $-	N/A

GPT-5.1

The GPT-5.1 series is built to respond faster to users in a variety of situations with adaptive reasoning, improving latency and cost efficiency across the series by varying thinking time more significantly. This, combined with other tooling improvements, enhanced stepwise reasoning visibility, multimodal intelligence, and enterprise-grade compliance.

Model	Pricing (1M Tokens)	Priority Processing (1M Tokens)
GPT-5.1 Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5.1 Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5.1-chat Global	Input: $- Cached Input: $- Output: $-	N/A
GPT-5.1-codex Global	Input: $- Cached Input: $- Output: $-	N/A
GPT-5.1-codex-max Global	Input: $- Cached Input: $- Output: $-	N/A
GPT-5.1-codex-mini Global	Input: $- Cached Input: $- Output: $-	N/A

GPT-5 series

Model	Pricing (1M Tokens)	Priority Processing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-5 2025-08-07 Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5 Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-
GPT-5 Pro Global	Input: $- Output: $-	N/A	N/A
GPT-5 Codex Global	Input: $- Cached Input: $- Output: $-	N/A	N/A
GPT-5-mini Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	N/A
GPT-5-mini Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	N/A
GPT-5-nano Global	Input: $- Cached Input: $- Output: $-	N/A	N/A
GPT-5-nano Data Zone	Input: $- Cached Input: $- Output: $-	N/A	N/A
GPT-5 chat Global	Input: $- Cached Input: $- Output: $-	N/A	N/A

Deep Research

Deep Research enables developers and enterprises to automate complex research tasks with structured, citation-rich answers. It is suitable for building customer support bots, internal knowledge assistants, or market analysis tools. Deep Research delivers transparent, auditable insights grounded in real-time web data. Search context tokens are charged input token prices for the model being used. You’ll separately incur charges for Grounding with Bing Search and the base GPT model being used for clarifying questions.

Model	Pricing
o3-deep research Global	Input: $- Cached Input: $- Output: $-

o3

o3 is a powerful reasoning model from the o-series of reasoning models, pushing the frontier across coding, math, science, and visual perception. It excels in complex queries requiring multi-faceted analysis and performs strongly in visual tasks like analyzing images, charts, and graphics. The model features a 200K token context window and has a knowledge cutoff of June 2024.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
o3 2025-04-16 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
o3 2025-04-16 Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
o3 2025-04-16 Regional	Input: $- Cached Input: $- Output: $-	N/A

o4-mini

o4-mini is a compact, efficient, and cost-effective reasoning model from OpenAI's o-series. It excels in math, coding, and visual tasks. The model features a 200K token context window and has a knowledge cutoff of June 2024.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
o4-mini 2025-04-16 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
o4-mini 2025-04-16 Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
o4-mini 2025-04-16 Regional	Input: $- Cached Input: $- Output: $-	N/A

GPT-4.1 series

GPT-4.1 series is a highly advanced general-purpose model with extensive world knowledge and an enhanced ability to understand user intent, making it particularly adept at creative tasks and agentic planning. The series features a 1 million token context window and has a knowledge cutoff of June 2024.

Model	Standard Pricing (1M Tokens)	Priority Processing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-4.1-2025-04-14 Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4.1-2025-04-14 Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4.1-2025-04-14 Regional	Input: $- Cached Input: $- Output: $-	N/A	N/A
GPT-4.1-mini-2025-04-14 Global	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4.1-mini-2025-04-14 Data Zone	Input: $- Cached Input: $- Output: $-	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4.1-mini-2025-04-14 Regional	Input: $- Cached Input: $- Output: $-	N/A	N/A
GPT-4.1-nano-2025-04-14 Global	Input: $- Cached Input: $- Output: $-	N/A	Input: $- Output: $-
GPT-4.1-nano-2025-04-14 Data Zone	Input: $- Cached Input: $- Output: $-	N/A	Input: $- Output: $-
GPT-4.1-nano-2025-04-14 Regional	Input: $- Cached Input: $- Output: $-	N/A	N/A

Sora in Azure OpenAI

Sora is a multimodal generative AI model now available in Azure AI Foundry, designed to help creative teams bring ideas to life through seamless API-first integration. Built on Azure’s enterprise-grade infrastructure, it offers secure, scalable deployment for transforming concepts into high-quality visual content.

Sora 2

Model	Size: Output Resolution	Price per second
Sora 2 Global	Portrait: 720x1280 Landscape: 1280x720	$-

Sora

Price per second	1-5s	6-10s	11-15s	16-20s
480 Square Global	$-	$-	$-	$-
480p Global	$-	$-	$-	$-
720 Square Global	$-	$-	$-	$-
720p Global	$-	$-	$-	$-
1080 Square Global	$-	$-	$-	$-
1080p Global	$-	$-	$-	$-

GPT-Image Series

GPT-Image models enhance DALL-E with better instruction following accurate text rendering, and support for image input and editing. The model is priced per token, with different pricing for text and image tokens.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-Image-2 Global	Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Text: $- Output Image: $-	N/A
GPT-Image-1.5 Global	Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Text: $- Output Image: $-	N/A
GPT-Image-1.5 Data Zone	Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Text: $- Output Image: $-	N/A
GPT-Image-1-mini Global	Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Image: $-	N/A
GPT-Image-1 Global	Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Image: $-	N/A
GPT-Image-1 Regional	Input Text: $- Cached Input Text: $- Input Image: $- Cached Input Image: $- Output Image: $-	N/A
GPT-Image-1 Data Zone	Input Text: $- Input Image: $- Output Image: $-	N/A

GPT-4.5

GPT-4.5-preview is the latest general purpose model with deep world knowledge and better understanding of user intent that makes it good at creative tasks and agentic planning. The model has 128K context and an October 2023 knowledge cutoff.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-4.5-Preview-2025-02-27 Global	Input: $- Cached Input: $- Output: $-	N/A

o1

o1 is the new reasoning model series for complex tasks. The model has 200K context and an October 2023 knowledge cutoff.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
o1 2024-12-17 Global	Input: $- Cached Input: $- Output: $-	N/A
o1 2024-12-17 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	N/A
o1 2024-12-17 Regional	Input: $- Cached Input: $- Output: $-	N/A
o1 preview 2024-09-12 Global	Input: $- Cached Input: $- Output: $-	N/A
o1 preview 2024-09-12 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	N/A
o1 preview 2024-09-12 Regional	Input: $- Cached Input: $- Output: $-	N/A

Plan with the Pricing Calculator

o3 Mini

The o3 mini is the updated version of o1 mini model. o3-mini is a fast, cost-efficient reasoning model tailored to coding, math, and science use cases.

The o3-mini model now boasts an expanded context input window of 200K tokens and a maximum output of 100K tokens, providing ample space for complex and detailed responses. The o1 mini model has 128K context input. Both o3 and o1 models have a knowledge cutoff of October 2023.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
o3 mini 2025-01-31 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
o3 mini 2025-01-31-US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
o3 mini 2025-01-31 Regional	Input: $- Cached Input: $- Output: $-	N/A
o1-mini 2024-09-12 Global	Input: $- Cached Input: $- Output: $-	N/A
o1-mini 2024-09-12 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	N/A
o1-mini 2024-09-12 Regional	Input: $- Cached Input: $- Output: $-	N/A

Plan with the Pricing Calculator

Open Source Models

gpt-oss-120b offers a high-performing, open and controllable LLM—blending frontier reasoning skills with enterprise-grade flexibility and deployment autonomy.

Model	Pricing (1M Tokens)
gpt-oss-120b	Input: $- Output: $-

Audio Models

GPT-realtime and GPT-audio models are now available via Azure AI Foundry and Azure OpenAI Service, enabling high-fidelity, low-latency voice interactions for production-grade applications. Additional audio models include GPT-4o, transcribe mini, and mini-tts, which deliver advanced speech-to-text and text-to-speech capabilities with emotionally expressive voices, customizable outputs, and superior accuracy—ideal for live customer call centers, real-time captioning, and interactive voice agents. The models leverage pretraining and distillation techniques to support natural turn-taking and stable APIs for multimodal deployments.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-Realtime-2 Global	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- Image Input: $- Cached Input: $-	N/A
GPT-Realtime-Translate Global	Output: $-/hour	N/A
GPT-Realtime-Whisper Global	Output: $-/hour	N/A
GPT-realtime-1.5 Global	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- Image Input: $- Cached Input: $-	N/A
GPT-audio-1.5 Global	Text Input: $- Output: $- Audio Input: $- Output: $-	N/A
GPT-realtime	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- Image Input: $- Cached Input: $-	N/A
GPT-realtime-mini-2025-12-15 Global	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $- Image Input: $- Cached Input: $-	N/A
GPT-audio	Text Input: $- Output: $- Audio Input: $- Output: $-	N/A
GPT-audio-mini-2025-12-15 Global	Text Input: $- Output: $- Audio Input: $- Output: $-	N/A
GPT-4o-Transcribe	Text Input: $- Output: $- Audio Input: $- Output: N/A	N/A
GPT-4o-transcribe-diarize	Text Input: $- Output: $- Audio Input: $- Output: N/A	N/A
GPT-4o-mini-transcribe-2025-12-15	Text Input: $- Output: $- Audio Input: $- Output: N/A	N/A
GPT-4o-mini-TTS-2025-12-15	Text Input: $- Output: N/A Audio Input: N/A Output: $-	N/A

Computer-Using Agent (CUA)

The Computer-Using Agent (CUA) is a specialized AI model that allows AI to interact with graphical user interfaces (GUIs), navigate applications, and automate multi-step tasks—all through natural language instructions. The CUA model can be used as a tool in the Responses API.

Model	Pricing
computer-use-preview Global	Input: $-/1M tokens Output: $-/1M tokens

Built-in tools

The Responses API and the Assistants API enable seamless interaction with tools like computer use, code interpreter, function calling, and file search, making it easy for developers to build AI agents.

Tool	Input
Computer Use (Responses API only)	Input: $-/1M tokens Output: $-/1M tokens
File Search Tool Call (Responses API only)	$-/1K tool calls
File Search^*	$-/GB of vector-storage per day (1 GB free)
Code Interpreter^**	$-/session

^*GB refers to binary gigabytes, where 1 gb is 2^30 bytes.

^**If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that you would only pay this fee once if your user keeps giving instructions to Code Interpreter in the same thread for up to one hour.

Inference cost (input and output) varies based on the GPT model used with each Assistant. If your assistant calls Code Interpreter simultaneously in two different threads, this would create two Code Interpreter sessions (2 * $-). Each session is active by default for one hour, which means that the price is for up to one hour of giving instructions to Code Interpreter in the same thread.

Realtime API

Featured in the Realtime API, the GPT-4o-Realtime-Preview supports multilingual speech-to-speech capabilities. Optimized for real-time, low-latency conversations, it enables natural interactions with minimal delay, ideal for chatbots and conversational AI. GPT-4o is the comprehensive, more powerful version designed for complex tasks, while GPT-4o Mini is a smaller, more affordable option ideal for simpler applications where cost-efficiency and speed are priorities.

Model	Pricing (1M Tokens)
GPT-4o-Realtime-Preview-2024-12-17-Global	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-12-17-US/EU – Data Zones	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-12-17-Regional	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Mini-Realtime-Preview-2024-12-17-Global	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Mini-Realtime-Preview-2024-12-17-US/EU – Data Zones	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Mini-Realtime-Preview-2024-12-17-Regional	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-10-01-Global	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-10-01-US/EU – Data Zones	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-
GPT-4o-Realtime-Preview-2024-10-01-Regional	Text Input: $- Cached Input: $- Output: $- Audio Input: $- Cached Input: $- Output: $-

Chat Completions API

Featured in the Chat Completions API, the GPT 4o-Audio-Preview model processes and generates audio content. It supports advanced features like speech recognition and audio synthesis, ideal for asynchronous speech interactions and sentiment analysis. GPT-4o is the comprehensive, more powerful version designed for complex tasks, while GPT-4o Mini is a smaller, more affordable option ideal for simpler applications where cost-efficiency and speed are priorities.

Model	Pricing (1M Tokens)
GPT-4o-Audio-Preview-2024-12-17-Global	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Audio-Preview-2024-12-17-US/EU – Data Zones	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Audio-Preview-2024-12-17-Regional	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Mini-Audio-Preview-2024-12-17-Global	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Mini-Audio-Preview-2024-12-17-US/EU – Data Zones	Text Input: $- Output: $- Audio Input: $- Output: $-
GPT-4o-Mini-Audio-Preview-2024-12-17-Regional	Text Input: $- Output: $- Audio Input: $- Output: $-

GPT-4o

GPT-4o is the most advanced multimodal model that’s faster and cheaper than GPT-4 Turbo with stronger vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-4o-2024-1120 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-1120 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-1120 Regional	Input: $- Cached Input: $- Output: $-	N/A
GPT-4o-2024-08-06 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-08-06 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-08-06 Regional	Input: $- Cached Input: $- Output: $-	N/A
GPT-4o-2024-0513 Global	Input: $- Output: $-	Input: $- Output: $-
GPT-4o-2024-0513 US/EU – Data Zones	Input: $- Output: $-	N/A
GPT-4o-2024-0513 Regional	Input: $- Output: $-	N/A

Plan with the Pricing Calculator

GPT-4o mini

GPT-4o mini is the most cost-efficient small model, and has vision capabilities. The model has 128K context and an October 2023 knowledge cutoff.

Model	Pricing (1M Tokens)	Pricing with Batch API (1M Tokens)
GPT-4o-mini-0718 Global	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-mini-0718 US/EU – Data Zones	Input: $- Cached Input: $- Output: $-	Input: $- Output: $-
GPT-4o-mini-0718 Regional	Input: $- Cached Input: $- Output: $-	N/A

Plan with the Pricing Calculator

Provisioned

You can allocate and manage throughput for deployments, ensuring predictable performance and stable capacity. You are charged an hourly rate per model regardless of usage, but you can also secure additional savings through monthly and annual reservations. Discover how to transition your regional deployments and provisioned reservations to global and data zones on this Learn page. To understand if your desired model is available in your specific region for provisioned pricing please visit this Learn page or contact your local sales rep for more details.

Model	Min PTUs	Per PTU Hourly pricing	Per PTU Monthly Reservation Pricing	Per PTU Yearly Reservation Pricing
GPT-5.5 Global	15	$-	$-	$-
GPT-5.5 Data Zone	15	$-	$-	$-
GPT-5.5 Regional	50	$-	$-	$-
GPT-5.4 (<272k context length) Global	15	$-	$-	$-
GPT-5.4 (<272k context length) Data Zone	15	$-	$-	$-
GPT-5.4 (<272k context length) Regional	50	$-	$-	$-
GPT-5.3 Codex Global	15	$-	$-	$-
GPT-5.3 Codex Data Zone	15	$-	$-	$-
GPT-5.3 Codex Regional	50	$-	$-	$-
GPT-5.2 Codex Global	15	$-	$-	$-
GPT-5.2 Codex Data Zone	15	$-	$-	$-
GPT-5.2 Codex Regional	50	$-	$-	$-
GPT-5.2 Global	15	$-	$-	$-
GPT-5.2 Data Zones	15	$-	$-	$-
GPT-5.2 Regional	50	$-	$-	$-
GPT-5.1 Codex Global	15	$-	$-	$-
GPT-5.1 Codex Data Zones	15	$-	$-	$-
GPT-5.1 Codex Regional	50	$-	$-	$-
GPT-5.1 Global	15	$-	$-	$-
GPT-5.1 Data Zones	15	$-	$-	$-
GPT-5.1 Regional	50	$-	$-	$-
GPT-5-mini Global	15	$-	$-	$-
GPT-5-mini Data Zones	15	$-	$-	$-
GPT-5-mini Regional	50	$-	$-	$-
GPT-5 Global	15	$-	$-	$-
GPT-5 Data Zones	15	$-	$-	$-
GPT-5 Regional	50	$-	$-	$-
GPT-4.1 Global	15	$-	$-	$-
GPT-4.1 Data Zones	15	$-	$-	$-
GPT-4.1 Regional	50	$-	$-	$-
GPT-4.1-mini Global	15	$-	$-	$-
GPT-4.1-mini US/EU Data Zones	15	$-	$-	$-
GPT-4.1-mini Regional	25	$-	$-	$-
GPT-4.1-nano Global	15	$-	$-	$-
GPT-4.1-nano US/EU Data Zones	15	$-	$-	$-
GPT-4.1-nano Regional	25	$-	$-	$-
o3-mini Global	15	$-	$-	$-
o3-mini US/EU Data Zones	15	$-	$-	$-
o3-mini Regional	25	$-	$-	$-
o3 Global	15	$-	$-	$-
o3 US/EU Data Zones	15	$-	$-	$-
o3 Regional	50	$-	$-	$-
o4-mini Global	15	$-	$-	$-
o4-mini US/EU Data Zones	15	$-	$-	$-
o4-mini Regional	25	$-	$-	$-
GPT-4o Global	15	$-	$-	$-
GPT-4o US/EU Data Zones	15	$-	$-	$-
GPT-4o Regional	50	$-	$-	$-
Fine-Tuned GPT-4o-Regional	50	$-	$-	$-
GPT-4o Mini Global	15	$-	$-	$-
GPT-4o Mini US/EU Data Zones	15	$-	$-	$-
GPT-4o Mini Regional	25	$-	$-	$-
Fine-Tuned GPT-4o-Mini Regional	25	$-	$-	$-

Plan with the Pricing Calculator

Base models

Models	Usage per 1,000 tokens
Babbage-002	$-
Davinci-002	$-

Fine-tuning models

Model		Pricing
o4-mini (Reinforcement fine-tuning)	Regional	Input: $-/1M tokens Output: $-/1M tokens Training: $-/hour Hosting: $-/hour Grader input: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens Grader cached input: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens Grader output: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens
	Global	Input: $-/1M tokens Output: $-/1M tokens Training: $-/hour Hosting: $-/hour Grader input: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens Grader cached input: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens Grader output: GPT-4o: $-/1M tokens o3-mini: $-/1M tokens
	Developer	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/hour
GPT-4.1	Regional	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Global	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Developer	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens
GPT-4.1-mini	Regional	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Global	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Developer	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens
GPT-4.1-nano	Regional	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Global	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Developer	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens
GPT-4o-2024-08-06	Regional	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Global	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Developer	Training: $-/1M tokens
GPT-4o-mini	Regional	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Global	Input: $-/1M tokens Cached Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
	Developer	Training: $-/1M tokens
GPT-3.5-Turbo (16K)	Regional	Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
GPT OSS 20B	Regional	Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour
GPT OSS 20B	Data Zone	Input: $-/1M tokens Output: $-/1M tokens Training: $-/1M tokens Hosting: $-/hour

Image models

Models	Quality	Resolution	Price (per 100 images)
Dall-E-3	Standard	1024 * 1024	$-
Dall-E-3	Standard	1024 * 1792, 1792 * 1024	$-
Dall-E-3	HD	1024 * 1024	$-
Dall-E-3	HD	1024 * 1792, 1792 * 1024	$-
Dall-E-2	Standard	1024 * 1024	$-

Embedding models

Models	Per 1,000 tokens
Ada	$-
Ada DataZone	$-
text-embedding-3-large	$-
text-embedding-3-large DataZone	$-
text-embedding-3-small	$-
text-embedding-3-small DataZone	$-

Speech Models

Models	Price
Models	Whisper	$-/hour
TTS (Text to Speech)	$-/1M characters
TTS HD	$-/1M characters

Legacy Language Models

Models	Context	Input (Per 1M Tokens)	Output (Per 1M Tokens)
GPT-3.5-Turbo-0301	4K	$-	$-
GPT-3.5-Turbo-0613	4K	$-	$-
GPT-3.5-Turbo-0613	16K	$-	$-
GPT-3.5-Turbo-1106	16K	$-	$-
GPT-3.5-Turbo-0125	16K	$-	$-
GPT-3.5-Turbo-Instruct	4K	$-	$-
GPT-4-Turbo	128K	$-	$-
GPT-4-Turbo-Vision	128K	$-	$-
GPT-4	8K	$-	$-
GPT-4	32K	$-	$-

Azure 가격 책정 및 구매 옵션

Microsoft와 직접 연락하기

Azure 가격 책정을 살펴보세요. 클라우드 솔루션의 가격 책정을 이해하고 비용 최적화에 대해 알아보고 사용자 지정 제안을 요청하세요.

판매 전문가에게 문의하기

구매 방법 확인

Azure 웹 사이트, Microsoft 담당자 또는 Azure 파트너를 통해 Azure 서비스를 구매하세요.

옵션 살펴보기

추가 리소스

자주 묻는 질문

Azure 가격에 관해 자주 묻는 질문

Azure OpenAI Service offers pricing based on both Pay-As-You-Go and Provisioned Throughput Units (PTUs). Pay-As-You-Go allows you to pay for the resources you consume, making it flexible for variable workloads. PTUs offers a predictable pricing model where you reserve and deploy a specific amount of model processing capacity. This model is ideal for workloads with consistent or predictable usage patterns, providing stability and cost control.
Azure Products by Region | Microsoft Azure
SLA for Azure AI Services | Microsoft Azure
To learn more about PTUs and Azure OpenAI pricing please read PTU documentation or contact our sales specialist.

판매 전문가에게 문의하여 Azure 가격을 알아보세요. 클라우드 솔루션의 가격을 파악하세요.

가격 견적 요청

별도 비용이 없는 클라우드 서비스와 $200 크레딧을 사용하여 30일간 Azure를 체험해 보세요.

Azure를 체험해 보기

예상 비용에 추가되었습니다. 계산기에서 보려면 ‘v’를 누르세요.

Azure OpenAI Service 가격

Azure OpenAI Service pricing overview

가격 옵션 살펴보기

GPT-Chat Latest

GPT 5.5 Series

GPT 5.4 Series

GPT-5.3 Series

GPT-5.2

GPT-5.1

GPT-5 series

Deep Research

o3

o4-mini

GPT-4.1 series

Sora in Azure OpenAI

Sora 2

Sora

GPT-Image Series

GPT-4.5

o1

o3 Mini

Open Source Models

Audio Models

Computer-Using Agent (CUA)

Built-in tools

Realtime API

Chat Completions API

GPT-4o

GPT-4o mini

Provisioned

Base models

Fine-tuning models

Image models

Embedding models

Speech Models

Legacy Language Models

Azure 가격 책정 및 구매 옵션

Microsoft와 직접 연락하기

구매 방법 확인

추가 리소스

Azure OpenAI Service

가격 계산기

SLA

설명서

자주 묻는 질문