Curlscape logo
← Back to all posts• Updated Aniket Kulkarni

Anthropic Claude API Pricing Guide 2026: Opus, Sonnet, and Haiku Compared

Anthropic Claude API Pricing Guide 2026: Opus, Sonnet, and Haiku Compared

TL;DR: Claude Opus 4.6 costs $5/$25 per million tokens (67% cheaper than Opus 4.1). Sonnet 4.6 is $3/$15 and handles 80% of production use cases. Haiku 4.5 at $1/$5 is the speed tier. Batch API saves 50%, prompt caching saves up to 90% on repeated requests. Run your numbers in our LLM Pricing Calculator.

Anthropic's Claude lineup follows a simple three-tier structure: Opus for maximum intelligence, Sonnet for production workloads, Haiku for speed and volume. But with 12 models now available via API and pricing that ranges from $0.25 to $25 per million output tokens, picking the right model matters.

The biggest news in Claude pricing this year: Opus got 67% cheaper. Opus 4.5 and 4.6 cost $5/$25 per million tokens, down from $15/$75 on Opus 4.1. That changes the calculus for when Opus makes sense.

This guide covers every Claude model available via API as of March 2026, the cost optimization features that can cut your bill by 50-90%, and how to pick the right tier for your workload.

All prices are per million tokens unless stated otherwise. Run your own numbers in our LLM Pricing Calculator.

The Full Claude Pricing Table (March 2026)

ModelInput (per 1M)Output (per 1M)Context WindowMax OutputStatus
Claude Opus 4.6$5.00$25.00200K (1M beta)128KCurrent flagship
Claude Opus 4.5$5.00$25.00200K64KPrevious gen
Claude Opus 4.1$15.00$75.00200K32KLegacy
Claude Opus 4$15.00$75.00200K32KLegacy
Claude Sonnet 4.6$3.00$15.00200K (1M beta)64KCurrent flagship
Claude Sonnet 4.5$3.00$15.00200K (1M beta)64KPrevious gen
Claude Sonnet 4$3.00$15.00200K (1M beta)64KPrevious gen
Claude Haiku 4.5$1.00$5.00200K64KCurrent
Claude Haiku 3.5$0.80$4.00200K8KLegacy
Claude Haiku 3$0.25$1.25200K4KDeprecated (retiring Apr 2026)

The pattern is clear: Anthropic is competing on capability at stable price points. Sonnet has held at $3/$15 across three generations. Opus dropped from $15/$75 to $5/$25 with the 4.5 release, making it accessible for the first time.

The Opus Price Drop: From $15 to $5

This is the single biggest development in Claude pricing. Opus 4.5 (November 2025) and Opus 4.6 (February 2026) cost $5/$25 per million tokens -- a 67% reduction from the $15/$75 that Opus 4 and 4.1 still charge.

What changed? Anthropic improved inference efficiency. Opus 4.6 is more capable than Opus 4.1 and cheaper to run. There is no reason to use Opus 4.1 for new projects.

The practical impact: a workload that cost $225/month on Opus 4.1 now costs $75/month on Opus 4.6. For teams that previously could not justify Opus, $5/$25 makes it viable for mid-volume use cases like document analysis, code review, and complex agentic workflows.

Opus 4.6 also brings a 128K maximum output token limit (up from 32K on Opus 4.1) and access to the 1M context window beta.

Understanding the Three Tiers

Opus: Maximum Intelligence ($5/$25)

Opus 4.6 is Anthropic's most capable model. It excels at complex reasoning, agentic workflows, multi-step problem solving, and tasks where accuracy has high stakes.

Opus 4.6 vs 4.5: Both cost $5/$25. Opus 4.6 adds adaptive thinking, a 128K max output window (vs 64K), and 1M context beta access. Use 4.6 for new projects.

When Opus makes sense:

  • Complex analysis where accuracy justifies the cost (legal, medical, financial)
  • Agentic workflows with tool use and multi-step reasoning
  • Research and synthesis across large document sets
  • Tasks where you have tested Sonnet and measured a meaningful quality gap

When Opus does not make sense:

  • High-volume chatbots (Sonnet is 40% cheaper on output)
  • Simple extraction or classification (Haiku handles it)
  • Any task where Sonnet performs within 5% of Opus quality

Sonnet: The Production Workhorse ($3/$15)

Sonnet is where most Claude-based production systems should land. At $3/$15, it delivers strong performance across coding, analysis, writing, and instruction-following.

Sonnet 4.6, 4.5, and 4 all cost $3/$15. Sonnet 4.6 is the latest with improvements to speed and reasoning. There is no cost reason to stay on older versions.

Sonnet's sweet spot:

  • Production chatbots and assistants
  • Code generation and review
  • Content creation and editing
  • Structured data extraction from complex documents
  • Agentic workflows with tool use

Haiku: Speed and Volume ($0.25 - $1.00 input)

Haiku is built for throughput. The tier spans from Haiku 3 at $0.25/$1.25 (retiring April 2026) to Haiku 4.5 at $1/$5.

The gap between Haiku versions is notable:

  • Haiku 4.5 ($1/$5): Current generation. Vision, function-calling, and extended thinking support. Best quality in the tier.
  • Haiku 3.5 ($0.80/$4): Previous gen. Vision capable. A reasonable middle ground.
  • Haiku 3 ($0.25/$1.25): Deprecated. Retiring April 19, 2026. Migrate to Haiku 4.5.

Haiku works for:

  • Classification and routing ("is this a billing question or a technical issue?")
  • Simple extraction tasks
  • Content moderation and filtering
  • High-volume preprocessing before a more expensive model handles the hard parts

Batch API Pricing: 50% Off Everything

Anthropic's Batch API processes requests asynchronously (results within 24 hours) at exactly half the standard price. If your workload is not latency-sensitive, this is the easiest cost optimization available.

ModelStandard InputBatch InputStandard OutputBatch Output
Opus 4.6$5.00$2.50$25.00$12.50
Sonnet 4.6$3.00$1.50$15.00$7.50
Haiku 4.5$1.00$0.50$5.00$2.50
Haiku 3.5$0.80$0.40$4.00$2.00

Batch pricing stacks with prompt caching, so you can combine both for even larger savings. A Sonnet workload using batch + prompt caching can cost as little as $0.15 per million cached input tokens -- 95% less than standard input pricing.

Prompt Caching: Up to 90% Savings on Input

Prompt caching stores frequently used prompt prefixes and charges a reduced rate on subsequent requests. This is critical for any application with a large system prompt.

The pricing works on a multiplier system against the base input price:

OperationMultiplierSonnet 4.6 PriceOpus 4.6 Price
Standard input1x$3.00/MTok$5.00/MTok
5-minute cache write1.25x$3.75/MTok$6.25/MTok
1-hour cache write2x$6.00/MTok$10.00/MTok
Cache read (hit)0.1x$0.30/MTok$0.50/MTok

The math: You pay a premium on the first request (1.25x or 2x) to write the cache. Every subsequent request that hits the cache pays just 0.1x -- a 90% discount. The breakeven is fast:

  • 5-minute cache: Pays for itself after 1 cache read
  • 1-hour cache: Pays for itself after 2 cache reads

Example: A Sonnet-based assistant with a 4,000-token system prompt handling 20,000 requests/month. Without caching, system prompt costs: 80M tokens x $3/MTok = $240. With caching (5-minute TTL, assuming 95% hit rate): first-request writes cost ~$15, cache reads cost ~$23. Total: ~$38. Savings: $202/month (84%).

Cache pricing stacks with batch API discounts. A batch request hitting a cached prefix gets both discounts.

The 200K Token Trap: Long-Context Pricing

This catches developers off guard. When your input exceeds 200,000 tokens, Anthropic switches to premium pricing -- and it applies to all tokens in the request, not just those above 200K.

ModelStandard (<=200K input)Long Context (>200K input)
Opus 4.6$5.00 in / $25.00 out$10.00 in / $37.50 out
Sonnet 4.6$3.00 in / $15.00 out$6.00 in / $22.50 out
Sonnet 4.5$3.00 in / $15.00 out$6.00 in / $22.50 out

A request with 199K input tokens costs $0.60 on Sonnet 4.6. Push it to 201K tokens and the cost jumps to $1.21 -- over 2x more. If you are working near the 200K boundary, it is worth trimming your input to stay under.

The 1M context window is currently in beta, available to organizations in usage tier 4 or with custom rate limits. You need to send the `anthropic-beta: context-1m-2025-08-07` header to enable it.

Extended Thinking: Reasoning at Output Token Prices

Sonnet and Opus support extended thinking, where the model reasons step-by-step before producing a final answer. Think of it as Anthropic's answer to OpenAI's o-series reasoning models.

Key pricing detail: thinking tokens are billed at the standard output rate. A Sonnet response that generates 3,000 thinking tokens + 1,000 output tokens is billed for 4,000 output tokens total.

This means reasoning-heavy tasks get expensive quickly:

  • Standard Sonnet response (1,000 output tokens): $0.015
  • With extended thinking (3,000 thinking + 1,000 output = 4,000 tokens): $0.06 -- 4x more expensive

Extended thinking is supported on Opus 4.6 (adaptive mode), Opus 4.5, Opus 4.1, Opus 4, Sonnet 4.6, Sonnet 4.5, Sonnet 4, and Haiku 4.5.

Factor thinking token usage into your cost estimates. The billed output count will not match the visible response length.

Web Search and Tool Pricing

Anthropic recently added built-in web search capability to the API:

  • Web search: $10 per 1,000 searches ($0.01 per search) plus standard token costs for search-generated content. Failed searches are not billed.
  • Web fetch tool: No additional charges beyond standard token costs.
  • Code execution: $0.05/hour per container after 1,550 free hours/month.

For agentic applications that need real-time information, web search adds minimal cost. At $0.01 per search, even 10,000 searches/month is only $100.

Rate Limit Tiers

Anthropic uses a tiered system based on your cumulative API spend. Higher tiers unlock more throughput:

TierSpend RequiredRequests/min (Sonnet)Input Tokens/minOutput Tokens/min
Tier 1$05040,0008,000
Tier 2$401,00080,00016,000
Tier 3$2002,000160,00032,000
Tier 4$2,0004,000400,00080,000

Tier 4 also unlocks the 1M context window beta. Rate limits are enforced per model, so using Sonnet and Haiku simultaneously gives you separate quotas for each.

If you are building a production application, plan for tier 3 or 4. Tier 1 limits (50 RPM) are too low for anything beyond prototyping.

Real-World Cost Scenarios

Scenario 1: B2B SaaS AI Feature (5M Input + 2M Output Tokens/Month)

A moderate workload -- a few hundred daily active users hitting an AI feature.

ModelMonthly CostWith Batch APIWith Caching
Opus 4.6$75.00$37.50~$32.00
Sonnet 4.6$45.00$22.50~$19.00
Haiku 4.5$15.00$7.50~$6.50

Most teams should start with Sonnet 4.6 here. At $45/month (or ~$19 with caching), the cost is negligible compared to the engineering salaries building the feature.

Run this calculation with your numbers

Scenario 2: AI Content Platform (3M Input + 5M Output Tokens/Month)

1,000 articles/month, each with ~3,000 input tokens and ~5,000 output tokens.

ModelMonthly CostPer-Article Cost
Opus 4.6$140.00$0.14
Sonnet 4.6$84.00$0.084
Haiku 4.5$28.00$0.028

Sonnet is the practical choice for content generation. At $0.084 per article, cost is not a factor. Opus 4.6 at $0.14 per article is now viable for premium content where quality has measurable business impact.

Try this calculation

Scenario 3: Customer Support Triage (25M Input + 5M Output Tokens/Month)

50,000 tickets/month. Each ticket: ~500 tokens input, ~100 tokens output for classification.

ModelMonthly Cost
Opus 4.6$250.00
Sonnet 4.6$150.00
Haiku 4.5$50.00

Haiku 4.5 at $50/month is the clear winner. Classification accuracy for well-defined categories is high even on smaller models. The 5x savings over Sonnet is money better spent elsewhere.

Try this calculation

Scenario 4: Agentic Coding Assistant (Heavy Extended Thinking)

A development team of 10, each making ~50 requests/day with extended thinking enabled. Average: 2,000 input tokens, 8,000 output tokens (including 5,000 thinking tokens) per request.

Monthly: ~15M input tokens, ~120M output tokens.

ModelMonthly CostNotes
Opus 4.6$3,075.00Best reasoning quality
Sonnet 4.6$1,845.00Strong reasoning, 40% cheaper

Extended thinking makes output tokens the dominant cost. At 120M output tokens, even Sonnet costs $1,800/month. For budget-constrained teams, consider limiting thinking budget or routing simpler queries to Haiku.

Claude vs the Competition: March 2026

How does Claude stack up against other major API providers?

Claude ModelPrice (in/out)CompetitorPrice (in/out)Verdict
Opus 4.6$5/$25GPT-5.2$1.75/$14OpenAI 65% cheaper on input
Opus 4.6$5/$25Gemini 2.5 Pro$1.25/$10Gemini 75% cheaper, 1M context
Sonnet 4.6$3/$15GPT-5.2$1.75/$14OpenAI cheaper, similar quality
Sonnet 4.6$3/$15Gemini 2.5 Flash$0.30/$2.50Gemini 10x cheaper on input
Haiku 4.5$1/$5GPT-5 Mini$0.25/$2OpenAI 4x cheaper
Haiku 4.5$1/$5DeepSeek V3.2$0.28/$0.42DeepSeek 3.5x cheaper on input, 12x on output
Haiku 4.5$1/$5Grok 4.1 Fast$0.20/$0.50xAI 5x cheaper, 2M context

On raw price, Claude is the most expensive mainstream API at every tier. OpenAI undercuts on input pricing, Google undercuts on both, and DeepSeek is in a different league on cost.

But price is not the whole story. Claude is widely regarded as stronger at:

  • Following complex, nuanced instructions without drift
  • Maintaining consistent tone and style across long outputs
  • Code generation quality (Sonnet is a developer favorite)
  • Handling ambiguity without hallucinating
  • Agentic tool use reliability

The right approach: benchmark both on your specific task. If Claude's quality advantage translates to measurable business value (fewer errors, better user satisfaction, less post-processing), the premium pays for itself. For many teams, it does.

For a detailed OpenAI pricing breakdown, see our OpenAI API Pricing Guide 2026. For a broader comparison across all providers, see our LLM Pricing Comparison.

Claude Code and Subscription Plans vs API

If you use Claude through the consumer products (claude.ai) or Claude Code (the CLI tool), subscription plans may be cheaper than the API for individual use:

PlanPriceBest For
Free$0/monthLight experimentation
Pro$20/monthRegular individual use
Max$100/month (5x) or $200/month (20x)Heavy Claude Code users
Team$30/seat/monthSmall teams
EnterpriseCustomLarge organizations

For heavy Claude Code users, the Max plan at $100-200/month can be dramatically cheaper than equivalent API usage. One developer reported using 10 billion tokens over 8 months via Max subscription -- the equivalent API cost would have been $15,000+.

The API is the right choice when you need programmatic access, custom integrations, batch processing, or control over model parameters. Subscriptions are better for interactive use.

Cost Optimization Playbook

Here are the strategies we use with clients to reduce Claude API costs by 50-90%:

1. Model routing (40-60% savings). Use a two-stage pipeline: Haiku classifies or routes requests, Sonnet handles complex cases, Opus handles edge cases. A typical distribution (70% Haiku / 25% Sonnet / 5% Opus) cuts costs dramatically compared to running everything through Sonnet.

2. Prompt caching (up to 90% on input). If your system prompt exceeds 1,000 tokens and you make more than a few requests per 5-minute window, enable caching. The ROI is immediate.

3. Batch API (50% savings). Any workload that can tolerate 24-hour latency -- nightly processing, bulk analysis, content generation pipelines -- should use batch.

4. Output token management. Output tokens cost 3-5x more than input across all tiers. A prompt that says "respond concisely in under 150 words" can halve your output costs. Design for brevity.

5. Stay under 200K input tokens. If you are near the boundary, trim context to avoid the 2x long-context surcharge on the entire request.

6. Stack discounts. Batch + caching multipliers compound. A Sonnet batch request hitting a 5-minute cache costs just $0.15/MTok input (vs $3.00 standard) -- a 95% reduction.

The Decision Framework

Here is how we advise clients to choose:

  • Start with Sonnet 4.6 ($3/$15) for any user-facing feature. It handles chatbots, code assistants, content generation, and data extraction well. This is the right default for 80% of production use cases.
  • Use Haiku 4.5 ($1/$5) for classification, routing, moderation, and any task with objectively measurable success criteria. Test Haiku first -- you might not need Sonnet.
  • Use Opus 4.6 ($5/$25) for high-stakes tasks where errors have real consequences. At the new price point, it is justifiable for document review, complex reasoning, and agentic workflows. Do not use it for tasks where Sonnet performs within 5% of Opus quality.
  • Avoid legacy Opus ($15/$75). Opus 4.6 is more capable and 67% cheaper. There is no reason to use Opus 4.1 or 4 for new projects.
  • Migrate off Haiku 3. It is deprecated and retiring April 2026. Move to Haiku 4.5.
  • Do not pay for Opus when you mean Sonnet. We see this constantly. Teams default to the biggest model "just to be safe" and spend 67% more than necessary. Test Sonnet first. Measure. Upgrade with data, not anxiety.

Anthropic's pricing strategy is becoming clear: hold Sonnet and Haiku prices steady while dropping Opus aggressively.

  • Opus: $15/$75 (4.0/4.1) to $5/$25 (4.5/4.6) -- a 67% drop in 6 months
  • Sonnet: $3/$15 across four generations (4, 4.5, 4.6) -- rock stable
  • Haiku: $0.25/$1.25 to $1/$5 -- actually increased with Haiku 4.5, reflecting much better capability

Expect this pattern to continue. Opus will likely approach Sonnet pricing within 12-18 months as inference efficiency improves. Sonnet may hold at $3/$15 through 2027 while getting more capable with each generation.

The practical takeaway: do not over-optimize model selection today. Pick the cheapest tier that meets your quality bar and revisit quarterly. The cost floor is still dropping.

---

Prices verified against [Anthropic's official pricing page](https://platform.claude.com/docs/en/about-claude/pricing) as of March 2026. Compare Claude against OpenAI, Google, and other providers in our [LLM Pricing Calculator](/tools/llm-pricing-calculator).

Need help building AI into your product?

We design, build, and integrate production AI systems. Talk directly with the engineers who'll build your solution.

Get in touch

Written by

Aniket Kulkarni

Aniket Kulkarni is the founder of Curlscape, an AI consulting firm that helps companies build and ship production AI systems. With experience spanning voice agents, LLM evaluation harnesses, and bespoke AI solutions, he works at the intersection of engineering and applied machine learning. He writes about practical AI implementation, model selection, and the tools shaping the AI ecosystem.

View all posts →

Frequently Asked Questions

How much does the Claude API cost per million tokens?

Claude API pricing depends on the model tier. As of March 2026: Opus 4.6 costs $5 input / $25 output per million tokens, Sonnet 4.6 costs $3 input / $15 output, and Haiku 4.5 costs $1 input / $5 output. Batch API offers 50% off these prices, and prompt caching can reduce input costs by up to 90%.

What is the cheapest Claude API model?

Claude Haiku 3 at $0.25/$1.25 per million tokens is the cheapest, but it is deprecated and retiring April 2026. The cheapest current-generation model is Haiku 4.5 at $1/$5 per million tokens. With batch API pricing, Haiku 4.5 drops to $0.50/$2.50.

What is the difference between Claude Opus, Sonnet, and Haiku?

Opus ($5/$25 per million tokens) is the most intelligent model, best for complex reasoning, agentic workflows, and high-stakes tasks. Sonnet ($3/$15) is the production workhorse for chatbots, coding, and content. Haiku ($1/$5) is optimized for speed and volume -- classification, routing, and preprocessing. Most production systems should default to Sonnet.

Is Claude cheaper than OpenAI GPT-5?

No. OpenAI undercuts Anthropic at every tier. Claude Sonnet 4.6 costs $3/$15 vs GPT-5.2 at $1.75/$14. Claude Haiku 4.5 costs $1/$5 vs GPT-5 Mini at $0.25/$2. However, Claude is widely preferred for instruction-following quality, coding, and nuanced tasks, so the price premium may be justified by measurable quality improvements.

How can I reduce my Claude API costs?

Five strategies: (1) Use prompt caching for up to 90% savings on repeated input. (2) Use batch API for 50% off non-latency-sensitive workloads. (3) Route requests by complexity -- Haiku for simple tasks, Sonnet for complex ones. (4) Keep input under 200K tokens to avoid the 2x long-context surcharge. (5) Design prompts for concise output, since output tokens cost 3-5x more than input.

What happens when I exceed 200K input tokens on Claude?

When your input exceeds 200,000 tokens, Anthropic applies premium pricing to the entire request, not just the tokens above 200K. On Sonnet 4.6, input price doubles from $3 to $6 per million tokens, and output increases from $15 to $22.50. Stay under 200K when possible, or budget for the 2x surcharge.

Does Claude charge for extended thinking tokens?

Yes. Extended thinking tokens are billed at the standard output token rate. A Sonnet 4.6 response with 3,000 thinking tokens and 1,000 visible output tokens costs the same as 4,000 output tokens ($0.06 total). The billed token count will not match the visible response length.

Does Anthropic offer a free tier for the Claude API?

New accounts receive initial credits to experiment with the API. There is no permanent free tier for ongoing use. For budget-conscious development, Haiku 4.5 at $1/$5 per million tokens is the most affordable current-generation model, and batch API pricing halves that to $0.50/$2.50.

Should I use Claude API or a Claude subscription plan?

For programmatic access, batch processing, and custom integrations, use the API. For interactive use and Claude Code, subscription plans (Pro at $20/month, Max at $100-200/month) can be dramatically cheaper. Heavy Claude Code users report that the Max plan saves thousands compared to equivalent API usage.

What are Claude API rate limits?

Claude uses a tiered rate limit system based on cumulative API spend. Tier 1 (free) allows 50 requests/minute for Sonnet. Tier 2 ($40 spend) increases to 1,000 RPM. Tier 3 ($200 spend) allows 2,000 RPM. Tier 4 ($2,000 spend) unlocks 4,000 RPM and the 1M context window beta. Limits are enforced per model.

Continue Reading

Get in Touch