Anthropic Claude API Pricing Guide 2026: Opus, Sonnet, and Haiku Compared

TL;DR: Claude Opus 4.6 costs $5/$25 per million tokens (67% cheaper than Opus 4.1). Sonnet 4.6 is $3/$15 and handles 80% of production use cases. Haiku 4.5 at $1/$5 is the speed tier. Batch API saves 50%, prompt caching saves up to 90% on repeated requests. Run your numbers in our LLM Pricing Calculator.
Anthropic's Claude lineup follows a simple three-tier structure: Opus for maximum intelligence, Sonnet for production workloads, Haiku for speed and volume. But with 12 models now available via API and pricing that ranges from $0.25 to $25 per million output tokens, picking the right model matters.
The biggest news in Claude pricing this year: Opus got 67% cheaper. Opus 4.5 and 4.6 cost $5/$25 per million tokens, down from $15/$75 on Opus 4.1. That changes the calculus for when Opus makes sense.
This guide covers every Claude model available via API as of March 2026, the cost optimization features that can cut your bill by 50-90%, and how to pick the right tier for your workload.
All prices are per million tokens unless stated otherwise. Run your own numbers in our LLM Pricing Calculator.
The Full Claude Pricing Table (March 2026)
| Model | Input (per 1M) | Output (per 1M) | Context Window | Max Output | Status |
| Claude Opus 4.6 | $5.00 | $25.00 | 200K (1M beta) | 128K | Current flagship |
| Claude Opus 4.5 | $5.00 | $25.00 | 200K | 64K | Previous gen |
| Claude Opus 4.1 | $15.00 | $75.00 | 200K | 32K | Legacy |
| Claude Opus 4 | $15.00 | $75.00 | 200K | 32K | Legacy |
| Claude Sonnet 4.6 | $3.00 | $15.00 | 200K (1M beta) | 64K | Current flagship |
| Claude Sonnet 4.5 | $3.00 | $15.00 | 200K (1M beta) | 64K | Previous gen |
| Claude Sonnet 4 | $3.00 | $15.00 | 200K (1M beta) | 64K | Previous gen |
| Claude Haiku 4.5 | $1.00 | $5.00 | 200K | 64K | Current |
| Claude Haiku 3.5 | $0.80 | $4.00 | 200K | 8K | Legacy |
| Claude Haiku 3 | $0.25 | $1.25 | 200K | 4K | Deprecated (retiring Apr 2026) |
The pattern is clear: Anthropic is competing on capability at stable price points. Sonnet has held at $3/$15 across three generations. Opus dropped from $15/$75 to $5/$25 with the 4.5 release, making it accessible for the first time.
The Opus Price Drop: From $15 to $5
This is the single biggest development in Claude pricing. Opus 4.5 (November 2025) and Opus 4.6 (February 2026) cost $5/$25 per million tokens -- a 67% reduction from the $15/$75 that Opus 4 and 4.1 still charge.
What changed? Anthropic improved inference efficiency. Opus 4.6 is more capable than Opus 4.1 and cheaper to run. There is no reason to use Opus 4.1 for new projects.
The practical impact: a workload that cost $225/month on Opus 4.1 now costs $75/month on Opus 4.6. For teams that previously could not justify Opus, $5/$25 makes it viable for mid-volume use cases like document analysis, code review, and complex agentic workflows.
Opus 4.6 also brings a 128K maximum output token limit (up from 32K on Opus 4.1) and access to the 1M context window beta.
Understanding the Three Tiers
Opus: Maximum Intelligence ($5/$25)
Opus 4.6 is Anthropic's most capable model. It excels at complex reasoning, agentic workflows, multi-step problem solving, and tasks where accuracy has high stakes.
Opus 4.6 vs 4.5: Both cost $5/$25. Opus 4.6 adds adaptive thinking, a 128K max output window (vs 64K), and 1M context beta access. Use 4.6 for new projects.
When Opus makes sense:
- Complex analysis where accuracy justifies the cost (legal, medical, financial)
- Agentic workflows with tool use and multi-step reasoning
- Research and synthesis across large document sets
- Tasks where you have tested Sonnet and measured a meaningful quality gap
When Opus does not make sense:
- High-volume chatbots (Sonnet is 40% cheaper on output)
- Simple extraction or classification (Haiku handles it)
- Any task where Sonnet performs within 5% of Opus quality
Sonnet: The Production Workhorse ($3/$15)
Sonnet is where most Claude-based production systems should land. At $3/$15, it delivers strong performance across coding, analysis, writing, and instruction-following.
Sonnet 4.6, 4.5, and 4 all cost $3/$15. Sonnet 4.6 is the latest with improvements to speed and reasoning. There is no cost reason to stay on older versions.
Sonnet's sweet spot:
- Production chatbots and assistants
- Code generation and review
- Content creation and editing
- Structured data extraction from complex documents
- Agentic workflows with tool use
Haiku: Speed and Volume ($0.25 - $1.00 input)
Haiku is built for throughput. The tier spans from Haiku 3 at $0.25/$1.25 (retiring April 2026) to Haiku 4.5 at $1/$5.
The gap between Haiku versions is notable:
- Haiku 4.5 ($1/$5): Current generation. Vision, function-calling, and extended thinking support. Best quality in the tier.
- Haiku 3.5 ($0.80/$4): Previous gen. Vision capable. A reasonable middle ground.
- Haiku 3 ($0.25/$1.25): Deprecated. Retiring April 19, 2026. Migrate to Haiku 4.5.
Haiku works for:
- Classification and routing ("is this a billing question or a technical issue?")
- Simple extraction tasks
- Content moderation and filtering
- High-volume preprocessing before a more expensive model handles the hard parts
Batch API Pricing: 50% Off Everything
Anthropic's Batch API processes requests asynchronously (results within 24 hours) at exactly half the standard price. If your workload is not latency-sensitive, this is the easiest cost optimization available.
| Model | Standard Input | Batch Input | Standard Output | Batch Output |
| Opus 4.6 | $5.00 | $2.50 | $25.00 | $12.50 |
| Sonnet 4.6 | $3.00 | $1.50 | $15.00 | $7.50 |
| Haiku 4.5 | $1.00 | $0.50 | $5.00 | $2.50 |
| Haiku 3.5 | $0.80 | $0.40 | $4.00 | $2.00 |
Batch pricing stacks with prompt caching, so you can combine both for even larger savings. A Sonnet workload using batch + prompt caching can cost as little as $0.15 per million cached input tokens -- 95% less than standard input pricing.
Prompt Caching: Up to 90% Savings on Input
Prompt caching stores frequently used prompt prefixes and charges a reduced rate on subsequent requests. This is critical for any application with a large system prompt.
The pricing works on a multiplier system against the base input price:
| Operation | Multiplier | Sonnet 4.6 Price | Opus 4.6 Price |
| Standard input | 1x | $3.00/MTok | $5.00/MTok |
| 5-minute cache write | 1.25x | $3.75/MTok | $6.25/MTok |
| 1-hour cache write | 2x | $6.00/MTok | $10.00/MTok |
| Cache read (hit) | 0.1x | $0.30/MTok | $0.50/MTok |
The math: You pay a premium on the first request (1.25x or 2x) to write the cache. Every subsequent request that hits the cache pays just 0.1x -- a 90% discount. The breakeven is fast:
- 5-minute cache: Pays for itself after 1 cache read
- 1-hour cache: Pays for itself after 2 cache reads
Example: A Sonnet-based assistant with a 4,000-token system prompt handling 20,000 requests/month. Without caching, system prompt costs: 80M tokens x $3/MTok = $240. With caching (5-minute TTL, assuming 95% hit rate): first-request writes cost ~$15, cache reads cost ~$23. Total: ~$38. Savings: $202/month (84%).
Cache pricing stacks with batch API discounts. A batch request hitting a cached prefix gets both discounts.
The 200K Token Trap: Long-Context Pricing
This catches developers off guard. When your input exceeds 200,000 tokens, Anthropic switches to premium pricing -- and it applies to all tokens in the request, not just those above 200K.
| Model | Standard (<=200K input) | Long Context (>200K input) |
| Opus 4.6 | $5.00 in / $25.00 out | $10.00 in / $37.50 out |
| Sonnet 4.6 | $3.00 in / $15.00 out | $6.00 in / $22.50 out |
| Sonnet 4.5 | $3.00 in / $15.00 out | $6.00 in / $22.50 out |
A request with 199K input tokens costs $0.60 on Sonnet 4.6. Push it to 201K tokens and the cost jumps to $1.21 -- over 2x more. If you are working near the 200K boundary, it is worth trimming your input to stay under.
The 1M context window is currently in beta, available to organizations in usage tier 4 or with custom rate limits. You need to send the `anthropic-beta: context-1m-2025-08-07` header to enable it.
Extended Thinking: Reasoning at Output Token Prices
Sonnet and Opus support extended thinking, where the model reasons step-by-step before producing a final answer. Think of it as Anthropic's answer to OpenAI's o-series reasoning models.
Key pricing detail: thinking tokens are billed at the standard output rate. A Sonnet response that generates 3,000 thinking tokens + 1,000 output tokens is billed for 4,000 output tokens total.
This means reasoning-heavy tasks get expensive quickly:
- Standard Sonnet response (1,000 output tokens): $0.015
- With extended thinking (3,000 thinking + 1,000 output = 4,000 tokens): $0.06 -- 4x more expensive
Extended thinking is supported on Opus 4.6 (adaptive mode), Opus 4.5, Opus 4.1, Opus 4, Sonnet 4.6, Sonnet 4.5, Sonnet 4, and Haiku 4.5.
Factor thinking token usage into your cost estimates. The billed output count will not match the visible response length.
Web Search and Tool Pricing
Anthropic recently added built-in web search capability to the API:
- Web search: $10 per 1,000 searches ($0.01 per search) plus standard token costs for search-generated content. Failed searches are not billed.
- Web fetch tool: No additional charges beyond standard token costs.
- Code execution: $0.05/hour per container after 1,550 free hours/month.
For agentic applications that need real-time information, web search adds minimal cost. At $0.01 per search, even 10,000 searches/month is only $100.
Rate Limit Tiers
Anthropic uses a tiered system based on your cumulative API spend. Higher tiers unlock more throughput:
| Tier | Spend Required | Requests/min (Sonnet) | Input Tokens/min | Output Tokens/min |
| Tier 1 | $0 | 50 | 40,000 | 8,000 |
| Tier 2 | $40 | 1,000 | 80,000 | 16,000 |
| Tier 3 | $200 | 2,000 | 160,000 | 32,000 |
| Tier 4 | $2,000 | 4,000 | 400,000 | 80,000 |
Tier 4 also unlocks the 1M context window beta. Rate limits are enforced per model, so using Sonnet and Haiku simultaneously gives you separate quotas for each.
If you are building a production application, plan for tier 3 or 4. Tier 1 limits (50 RPM) are too low for anything beyond prototyping.
Real-World Cost Scenarios
Scenario 1: B2B SaaS AI Feature (5M Input + 2M Output Tokens/Month)
A moderate workload -- a few hundred daily active users hitting an AI feature.
| Model | Monthly Cost | With Batch API | With Caching |
| Opus 4.6 | $75.00 | $37.50 | ~$32.00 |
| Sonnet 4.6 | $45.00 | $22.50 | ~$19.00 |
| Haiku 4.5 | $15.00 | $7.50 | ~$6.50 |
Most teams should start with Sonnet 4.6 here. At $45/month (or ~$19 with caching), the cost is negligible compared to the engineering salaries building the feature.
Run this calculation with your numbers
Scenario 2: AI Content Platform (3M Input + 5M Output Tokens/Month)
1,000 articles/month, each with ~3,000 input tokens and ~5,000 output tokens.
| Model | Monthly Cost | Per-Article Cost |
| Opus 4.6 | $140.00 | $0.14 |
| Sonnet 4.6 | $84.00 | $0.084 |
| Haiku 4.5 | $28.00 | $0.028 |
Sonnet is the practical choice for content generation. At $0.084 per article, cost is not a factor. Opus 4.6 at $0.14 per article is now viable for premium content where quality has measurable business impact.
Scenario 3: Customer Support Triage (25M Input + 5M Output Tokens/Month)
50,000 tickets/month. Each ticket: ~500 tokens input, ~100 tokens output for classification.
| Model | Monthly Cost |
| Opus 4.6 | $250.00 |
| Sonnet 4.6 | $150.00 |
| Haiku 4.5 | $50.00 |
Haiku 4.5 at $50/month is the clear winner. Classification accuracy for well-defined categories is high even on smaller models. The 5x savings over Sonnet is money better spent elsewhere.
Scenario 4: Agentic Coding Assistant (Heavy Extended Thinking)
A development team of 10, each making ~50 requests/day with extended thinking enabled. Average: 2,000 input tokens, 8,000 output tokens (including 5,000 thinking tokens) per request.
Monthly: ~15M input tokens, ~120M output tokens.
| Model | Monthly Cost | Notes |
| Opus 4.6 | $3,075.00 | Best reasoning quality |
| Sonnet 4.6 | $1,845.00 | Strong reasoning, 40% cheaper |
Extended thinking makes output tokens the dominant cost. At 120M output tokens, even Sonnet costs $1,800/month. For budget-constrained teams, consider limiting thinking budget or routing simpler queries to Haiku.
Claude vs the Competition: March 2026
How does Claude stack up against other major API providers?
| Claude Model | Price (in/out) | Competitor | Price (in/out) | Verdict |
| Opus 4.6 | $5/$25 | GPT-5.2 | $1.75/$14 | OpenAI 65% cheaper on input |
| Opus 4.6 | $5/$25 | Gemini 2.5 Pro | $1.25/$10 | Gemini 75% cheaper, 1M context |
| Sonnet 4.6 | $3/$15 | GPT-5.2 | $1.75/$14 | OpenAI cheaper, similar quality |
| Sonnet 4.6 | $3/$15 | Gemini 2.5 Flash | $0.30/$2.50 | Gemini 10x cheaper on input |
| Haiku 4.5 | $1/$5 | GPT-5 Mini | $0.25/$2 | OpenAI 4x cheaper |
| Haiku 4.5 | $1/$5 | DeepSeek V3.2 | $0.28/$0.42 | DeepSeek 3.5x cheaper on input, 12x on output |
| Haiku 4.5 | $1/$5 | Grok 4.1 Fast | $0.20/$0.50 | xAI 5x cheaper, 2M context |
On raw price, Claude is the most expensive mainstream API at every tier. OpenAI undercuts on input pricing, Google undercuts on both, and DeepSeek is in a different league on cost.
But price is not the whole story. Claude is widely regarded as stronger at:
- Following complex, nuanced instructions without drift
- Maintaining consistent tone and style across long outputs
- Code generation quality (Sonnet is a developer favorite)
- Handling ambiguity without hallucinating
- Agentic tool use reliability
The right approach: benchmark both on your specific task. If Claude's quality advantage translates to measurable business value (fewer errors, better user satisfaction, less post-processing), the premium pays for itself. For many teams, it does.
For a detailed OpenAI pricing breakdown, see our OpenAI API Pricing Guide 2026. For a broader comparison across all providers, see our LLM Pricing Comparison.
Claude Code and Subscription Plans vs API
If you use Claude through the consumer products (claude.ai) or Claude Code (the CLI tool), subscription plans may be cheaper than the API for individual use:
| Plan | Price | Best For |
| Free | $0/month | Light experimentation |
| Pro | $20/month | Regular individual use |
| Max | $100/month (5x) or $200/month (20x) | Heavy Claude Code users |
| Team | $30/seat/month | Small teams |
| Enterprise | Custom | Large organizations |
For heavy Claude Code users, the Max plan at $100-200/month can be dramatically cheaper than equivalent API usage. One developer reported using 10 billion tokens over 8 months via Max subscription -- the equivalent API cost would have been $15,000+.
The API is the right choice when you need programmatic access, custom integrations, batch processing, or control over model parameters. Subscriptions are better for interactive use.
Cost Optimization Playbook
Here are the strategies we use with clients to reduce Claude API costs by 50-90%:
1. Model routing (40-60% savings). Use a two-stage pipeline: Haiku classifies or routes requests, Sonnet handles complex cases, Opus handles edge cases. A typical distribution (70% Haiku / 25% Sonnet / 5% Opus) cuts costs dramatically compared to running everything through Sonnet.
2. Prompt caching (up to 90% on input). If your system prompt exceeds 1,000 tokens and you make more than a few requests per 5-minute window, enable caching. The ROI is immediate.
3. Batch API (50% savings). Any workload that can tolerate 24-hour latency -- nightly processing, bulk analysis, content generation pipelines -- should use batch.
4. Output token management. Output tokens cost 3-5x more than input across all tiers. A prompt that says "respond concisely in under 150 words" can halve your output costs. Design for brevity.
5. Stay under 200K input tokens. If you are near the boundary, trim context to avoid the 2x long-context surcharge on the entire request.
6. Stack discounts. Batch + caching multipliers compound. A Sonnet batch request hitting a 5-minute cache costs just $0.15/MTok input (vs $3.00 standard) -- a 95% reduction.
The Decision Framework
Here is how we advise clients to choose:
- Start with Sonnet 4.6 ($3/$15) for any user-facing feature. It handles chatbots, code assistants, content generation, and data extraction well. This is the right default for 80% of production use cases.
- Use Haiku 4.5 ($1/$5) for classification, routing, moderation, and any task with objectively measurable success criteria. Test Haiku first -- you might not need Sonnet.
- Use Opus 4.6 ($5/$25) for high-stakes tasks where errors have real consequences. At the new price point, it is justifiable for document review, complex reasoning, and agentic workflows. Do not use it for tasks where Sonnet performs within 5% of Opus quality.
- Avoid legacy Opus ($15/$75). Opus 4.6 is more capable and 67% cheaper. There is no reason to use Opus 4.1 or 4 for new projects.
- Migrate off Haiku 3. It is deprecated and retiring April 2026. Move to Haiku 4.5.
- Do not pay for Opus when you mean Sonnet. We see this constantly. Teams default to the biggest model "just to be safe" and spend 67% more than necessary. Test Sonnet first. Measure. Upgrade with data, not anxiety.
Pricing Trends
Anthropic's pricing strategy is becoming clear: hold Sonnet and Haiku prices steady while dropping Opus aggressively.
- Opus: $15/$75 (4.0/4.1) to $5/$25 (4.5/4.6) -- a 67% drop in 6 months
- Sonnet: $3/$15 across four generations (4, 4.5, 4.6) -- rock stable
- Haiku: $0.25/$1.25 to $1/$5 -- actually increased with Haiku 4.5, reflecting much better capability
Expect this pattern to continue. Opus will likely approach Sonnet pricing within 12-18 months as inference efficiency improves. Sonnet may hold at $3/$15 through 2027 while getting more capable with each generation.
The practical takeaway: do not over-optimize model selection today. Pick the cheapest tier that meets your quality bar and revisit quarterly. The cost floor is still dropping.
---
Prices verified against [Anthropic's official pricing page](https://platform.claude.com/docs/en/about-claude/pricing) as of March 2026. Compare Claude against OpenAI, Google, and other providers in our [LLM Pricing Calculator](/tools/llm-pricing-calculator).
Need help building AI into your product?
We design, build, and integrate production AI systems. Talk directly with the engineers who'll build your solution.
Get in touchWritten by
Aniket Kulkarni
Aniket Kulkarni is the founder of Curlscape, an AI consulting firm that helps companies build and ship production AI systems. With experience spanning voice agents, LLM evaluation harnesses, and bespoke AI solutions, he works at the intersection of engineering and applied machine learning. He writes about practical AI implementation, model selection, and the tools shaping the AI ecosystem.
Frequently Asked Questions
How much does the Claude API cost per million tokens?▼
Claude API pricing depends on the model tier. As of March 2026: Opus 4.6 costs $5 input / $25 output per million tokens, Sonnet 4.6 costs $3 input / $15 output, and Haiku 4.5 costs $1 input / $5 output. Batch API offers 50% off these prices, and prompt caching can reduce input costs by up to 90%.
What is the cheapest Claude API model?▼
Claude Haiku 3 at $0.25/$1.25 per million tokens is the cheapest, but it is deprecated and retiring April 2026. The cheapest current-generation model is Haiku 4.5 at $1/$5 per million tokens. With batch API pricing, Haiku 4.5 drops to $0.50/$2.50.
What is the difference between Claude Opus, Sonnet, and Haiku?▼
Opus ($5/$25 per million tokens) is the most intelligent model, best for complex reasoning, agentic workflows, and high-stakes tasks. Sonnet ($3/$15) is the production workhorse for chatbots, coding, and content. Haiku ($1/$5) is optimized for speed and volume -- classification, routing, and preprocessing. Most production systems should default to Sonnet.
Is Claude cheaper than OpenAI GPT-5?▼
No. OpenAI undercuts Anthropic at every tier. Claude Sonnet 4.6 costs $3/$15 vs GPT-5.2 at $1.75/$14. Claude Haiku 4.5 costs $1/$5 vs GPT-5 Mini at $0.25/$2. However, Claude is widely preferred for instruction-following quality, coding, and nuanced tasks, so the price premium may be justified by measurable quality improvements.
How can I reduce my Claude API costs?▼
Five strategies: (1) Use prompt caching for up to 90% savings on repeated input. (2) Use batch API for 50% off non-latency-sensitive workloads. (3) Route requests by complexity -- Haiku for simple tasks, Sonnet for complex ones. (4) Keep input under 200K tokens to avoid the 2x long-context surcharge. (5) Design prompts for concise output, since output tokens cost 3-5x more than input.
What happens when I exceed 200K input tokens on Claude?▼
When your input exceeds 200,000 tokens, Anthropic applies premium pricing to the entire request, not just the tokens above 200K. On Sonnet 4.6, input price doubles from $3 to $6 per million tokens, and output increases from $15 to $22.50. Stay under 200K when possible, or budget for the 2x surcharge.
Does Claude charge for extended thinking tokens?▼
Yes. Extended thinking tokens are billed at the standard output token rate. A Sonnet 4.6 response with 3,000 thinking tokens and 1,000 visible output tokens costs the same as 4,000 output tokens ($0.06 total). The billed token count will not match the visible response length.
Does Anthropic offer a free tier for the Claude API?▼
New accounts receive initial credits to experiment with the API. There is no permanent free tier for ongoing use. For budget-conscious development, Haiku 4.5 at $1/$5 per million tokens is the most affordable current-generation model, and batch API pricing halves that to $0.50/$2.50.
Should I use Claude API or a Claude subscription plan?▼
For programmatic access, batch processing, and custom integrations, use the API. For interactive use and Claude Code, subscription plans (Pro at $20/month, Max at $100-200/month) can be dramatically cheaper. Heavy Claude Code users report that the Max plan saves thousands compared to equivalent API usage.
What are Claude API rate limits?▼
Claude uses a tiered rate limit system based on cumulative API spend. Tier 1 (free) allows 50 requests/minute for Sonnet. Tier 2 ($40 spend) increases to 1,000 RPM. Tier 3 ($200 spend) allows 2,000 RPM. Tier 4 ($2,000 spend) unlocks 4,000 RPM and the 1M context window beta. Limits are enforced per model.
Continue Reading

Fine-tuning open models in the real world: Unsloth, Axolotl, and the case for Docker
Production lessons from fine-tuning open models and why Curlscape uses Docker to ensure GPU training environments are reproducible and reliable.

OpenAI API Pricing Guide 2026: Every Model Compared
Every OpenAI API model priced and compared for 2026, from GPT-5.2 to o4 Mini. Includes real-world cost calculations for chatbots, pipelines, and more.
