OpenAI API Pricing Guide 2026: Every Model Compared

OpenAI now has 15 models available via API. That is not a typo. Fifteen. And the pricing spreads across a 300x range, from $0.05 per million input tokens (GPT-5 Nano) to $15 per million input tokens (o1).

If you are building on the OpenAI API in 2026, picking the wrong model can mean the difference between a $50/month bill and a $5,000/month bill for the same workload. This guide breaks down every model, what it actually costs in practice, and when to use each one.

All prices are per million tokens unless stated otherwise. You can plug your own numbers into our LLM Pricing Calculator to get exact estimates.

The Full OpenAI Pricing Table (February 2026)

Model	Input (per 1M tokens)	Output (per 1M tokens)	Context Window	Speed	Best For
GPT-5.2	$1.75	$14.00	400K	Medium	Flagship reasoning + vision
GPT-5.1	$1.25	$10.00	400K	Medium	Previous-gen flagship
GPT-5	$1.25	$10.00	400K	Medium	Previous-gen flagship
GPT-5 Mini	$0.25	$2.00	400K	Fast	Cost-effective general use
GPT-5 Nano	$0.05	$0.40	400K	Very Fast	High-volume simple tasks
GPT-4.1	$2.00	$8.00	1M+	Fast	Long-context workloads
GPT-4.1 Mini	$0.40	$1.60	1M+	Very Fast	Budget long-context
GPT-4.1 Nano	$0.10	$0.40	1M+	Very Fast	Cheapest long-context
GPT-4o	$2.50	$10.00	128K	Fast	Legacy multimodal
GPT-4o Mini	$0.15	$0.60	128K	Very Fast	Legacy budget option
o1	$15.00	$60.00	200K	Slow	Deep reasoning
o1 Mini	$1.10	$4.40	128K	Medium	Budget reasoning
o3	$2.00	$8.00	200K	Slow	Current-gen reasoning
o3 Mini	$0.50	$2.00	200K	Medium	Lightweight reasoning
o4 Mini	$1.10	$4.40	200K	Medium	Latest reasoning (small)

Understanding the Model Families

OpenAI's lineup breaks into three distinct families, each with a different pricing philosophy.

GPT-5.x: The Workhorse Family

The GPT-5 family is where most production workloads should land. GPT-5.2 is the current flagship at $1.75/$14, but the real story is the tiering below it.

GPT-5 Mini at $0.25/$2 offers roughly GPT-4o-level quality at a fraction of the cost. GPT-5 Nano at $0.05/$0.40 is absurdly cheap and handles classification, extraction, and simple generation tasks well enough for most pipelines.

All GPT-5 models share a 400K context window, which is generous for the price tier.

GPT-4.1: The Long-Context Specialists

The GPT-4.1 family has a unique selling point: a 1M+ token context window. That is 8x larger than GPT-4o's 128K window.

If you are doing document analysis, code review across large repositories, or processing lengthy transcripts, GPT-4.1 models are the practical choice. The pricing is slightly higher than GPT-5 equivalents at the top end ($2/$8 vs $1.75/$14, note GPT-4.1 actually has cheaper output), but the Nano variant at $0.10/$0.40 makes long-context work accessible.

o-series: Reasoning Models

The o-series models use chain-of-thought reasoning, which means they think before they answer. This makes them slower and more expensive, but substantially better at math, logic, and multi-step problems.

o1 at $15/$60 is the most expensive model in the lineup and should only be used when you genuinely need deep reasoning. o3 at $2/$8 offers a much better price-to-reasoning ratio for most tasks. o3 Mini and o4 Mini at $0.50-1.10 input are reasonable for adding reasoning to cost-sensitive pipelines.

Real Cost Calculations

Let's put these numbers in context with three common use cases.

Use Case 1: Customer Support Chatbot

Assumptions: 10,000 conversations/month, average 800 tokens input (customer message + context), 400 tokens output (bot response). That is roughly 8M input tokens and 4M output tokens per month.

Model	Monthly Cost	Quality Trade-off
GPT-5.2	$70.00	Best quality, overkill for support
GPT-5 Mini	$10.00	Good quality, best value
GPT-5 Nano	$2.00	Adequate for scripted flows
GPT-4o	$60.00	No reason to use over GPT-5 Mini
GPT-4o Mini	$3.60	Legacy option, consider GPT-5 Nano

The winner here is GPT-5 Mini at $10/month. It handles conversational tasks well, and the 400K context window means you can stuff plenty of knowledge base content into the system prompt.

Try this calculation in our pricing calculator

Use Case 2: Document Analysis Pipeline

Assumptions: Processing 500 documents/month, each averaging 15,000 tokens. Output is structured extraction averaging 2,000 tokens per document. That is 7.5M input tokens and 1M output tokens.

Model	Monthly Cost	Notes
GPT-4.1	$23.00	Best for very long documents (1M context)
GPT-4.1 Mini	$4.60	Sweet spot for document work
GPT-5 Mini	$3.88	Good if docs fit in 400K context
GPT-4.1 Nano	$1.15	If extraction is simple/structured

For document analysis, GPT-4.1 Mini is the practical choice. The 1M+ context window means you never need to chunk documents, and at $4.60/month it is hard to justify more expensive options unless quality demands it.

Try this calculation

Use Case 3: Code Review Agent

Assumptions: 200 pull requests/month, each containing ~5,000 tokens of diff + 3,000 tokens of context. Agent generates ~2,000 tokens of review comments. That is 1.6M input and 0.4M output tokens.

Model	Monthly Cost	Notes
o3	$6.40	Best reasoning for complex logic
o3 Mini	$1.60	Good reasoning at lower cost
GPT-5.2	$8.40	Strong but reasoning models better for code review
GPT-4.1	$6.40	Good if reviewing large files

For code review, o3 at $6.40/month is worth it. Reasoning models catch logical errors that standard models miss. If budget is tight, o3 Mini at $1.60 is a reasonable compromise.

Try this calculation

Batch API: 50% Off

OpenAI offers a Batch API that processes requests asynchronously (results within 24 hours) at 50% of the standard price. If your workload is not latency-sensitive, nightly data processing, bulk classification, content generation pipelines, this is free money.

For example, that document analysis pipeline drops from $4.60/month to $2.30/month with GPT-4.1 Mini on the Batch API.

Cached Input Tokens

OpenAI automatically caches prompt prefixes that are reused across requests. Cached tokens cost 50% less. If your system prompt is 2,000 tokens and you send 10,000 requests, that is 20M tokens that get cached pricing instead of full price.

This matters most for high-volume applications with large system prompts. A support bot with a 3,000-token system prompt processing 50,000 requests/month saves roughly 30-40% on input costs.

Which Model Should You Actually Use?

Here is the decision framework we use with clients:

Start with GPT-5 Mini ($0.25/$2). It handles 80% of production use cases at a price point that makes cost optimization unnecessary.
Move to GPT-5.2 ($1.75/$14) only if you measure a quality gap that affects business metrics. Not vibes, metrics.
Use GPT-4.1 variants when you need the 1M+ context window. Do not pay for long context if you do not need it.
Use o3 or o3 Mini for tasks that require multi-step reasoning: math, logic, code analysis, planning. Standard models are cheaper but worse at these tasks.
Avoid o1 ($15/$60) unless you have a specific, validated need for its reasoning depth. o3 covers most reasoning use cases at 87% less cost.
Avoid GPT-4o ($2.50/$10) for new projects. GPT-5 Mini is cheaper and generally better. GPT-4o is a legacy model at this point.

The Hidden Costs

Token pricing is not the whole story. Watch for:

Rate limits: Lower-tier models have higher rate limits. If you hit rate limits on GPT-5.2 and need to add retry logic or queuing, factor in the engineering time.
Latency: o-series models are slow. If your UX requires fast responses, the reasoning models may not work even if the quality is better.
Output tokens are expensive: Output tokens cost 4-8x more than input tokens across all models. Design your prompts to get concise outputs. A prompt that says "respond in under 100 words" can cut your output costs in half.

Comparing OpenAI to Alternatives

OpenAI is not the only game in town. Here is how the key models stack up:

GPT-5 Mini ($0.25/$2) vs Claude Sonnet 4 ($3/$15): GPT-5 Mini is 12x cheaper on input. Sonnet 4 is better at nuanced writing and instruction-following, but for most API use cases, the price difference is hard to justify.
GPT-5.2 ($1.75/$14) vs Gemini 2.5 Pro ($1.25/$10): Similar pricing, but Gemini offers a 1M token context window vs GPT-5.2's 400K. If you need long context without paying GPT-4.1 prices, Gemini is worth evaluating.
o3 ($2/$8) vs DeepSeek V3.2 Reasoner ($0.28/$0.42): DeepSeek is roughly 7x cheaper for reasoning tasks. Quality varies by task, but for cost-sensitive reasoning pipelines, it is worth benchmarking.

Use our LLM Pricing Calculator to compare models side-by-side with your actual usage numbers.

How to Estimate Your Token Usage

Before you can calculate costs, you need to know how many tokens your application will use. Here are practical rules of thumb:

1 token is approximately 0.75 English words (or 4 characters)
A typical chatbot message from a user is 50-200 tokens
A system prompt ranges from 200 tokens (simple) to 4,000+ tokens (complex agent)
A page of text is roughly 500 tokens
A 10-page PDF is roughly 5,000-8,000 tokens

The most common mistake teams make is underestimating output tokens. If your model generates verbose responses, output costs dominate the bill. A model that outputs 500 tokens per response costs 8x more in output than one that outputs 500 tokens on GPT-5.2 ($14 per million output tokens).

Measure your actual token usage for a week before committing to cost projections. The difference between estimated and actual usage is often 2-3x.

Pricing Trends

OpenAI has consistently dropped prices over time. GPT-4o launched at higher prices than what GPT-5 Mini costs today for similar capability. The pattern is clear: flagship models get expensive, then get undercut by the next generation's mid-tier model within 6-12 months.

The practical takeaway: do not over-optimize your model choice. Pick the cheapest model that meets your quality bar, and expect that model to get cheaper or be replaced by something better within a year.

---

Prices verified against [OpenAI's official pricing page](https://developers.openai.com/api/docs/pricing) as of February 2026. Use our [LLM Pricing Calculator](/tools/llm-pricing-calculator) to estimate costs for your specific workload.