List prices are fiction for any optimized LLM workload. All three major providers now discount cached input by 90%, and all three discount batch processing by 50% on input and output. Stack them where stacking is supported and your effective price runs 5–50% of list. Claude Sonnet 4.6's $3.00 input becomes $0.15. gpt-5.4-nano's $0.20 becomes $0.01. This page computes the real numbers — every cell derived from provider list prices using a stated multiplier methodology, so you can check the arithmetic yourself.

Last verified: June 10, 2026

Every list price on this page was checked against the provider's own pricing page on 2026-06-10. Computed cells are pure arithmetic on those verified prices. See the changelog at the bottom for the verification history and the monthly re-check commitment.

LeanLM (not affiliated with Google's LearnLM educational AI) is an LLM cost optimization platform — this page is our reference table for what optimized workloads actually pay per million tokens.

Most LLM cost comparisons stop at list price, which makes them useless for anyone running LLM caching strategies or asynchronous pipelines. The gap between list and effective price is now so large that list-price comparisons routinely point teams at the wrong provider. Below: the methodology, then the tables, then the picks.

Methodology: How Every Cell Is Computed

The formula

Effective input price = list price × batch multiplier (0.5) × cache-read multiplier (0.1), applied where the provider supports stacking. Example: Claude Sonnet 4.6 = $3.00 × 0.5 × 0.1 = $0.15/MTok — 95% off list. Output tokens get the batch multiplier only (caching never discounts output): $15.00 × 0.5 = $7.50/MTok.

Three inputs drive every computed cell:

  1. Cache-read multiplier: 0.1×. All three majors now price cached input reads at 10% of list — OpenAI on GPT-5.x models (prompt caching guide; older models get less: 75% off on gpt-4.1, 50% on gpt-4o), Anthropic flat across the lineup (prompt caching docs), and Gemini 2.5+ (context caching docs — the discount rose from 75% to 90%, so older posts understate it).
  2. Batch multiplier: 0.5×. OpenAI (Batch API), Anthropic, and Gemini (Batch API) all discount asynchronous batch traffic 50% on both input and output.
  3. Cache-write amortization (Anthropic only). Anthropic charges 1.25× input to write a 5-minute cache entry and 2× for the 1-hour TTL. The cached columns below show the pure read price (0.1×); at a stated assumption of 1 write per 20 reads on the 5-minute tier, the blended multiplier is (1.25 + 20 × 0.1) ÷ 21 ≈ 0.155× list — e.g. Sonnet 4.6 blends to ~$0.46/MTok cached instead of $0.30. OpenAI has no write surcharge; Gemini's implicit caching has none either, while explicit caching bills a storage fee ($1.00/MTok/hr on Flash models, $4.50/MTok/hr on Pro) instead.

Who lets you stack batch + cache? Anthropic: yes, verbatim — its pricing page states caching multipliers "stack with other pricing modifiers, including the Batch API discount." OpenAI: yes for GPT-5+ models in the Batch API (pre-GPT-5 models are not supported for caching in Batch); Flex processing on the Responses API is OpenAI's recommended combine path — same 50% discount with full caching support. Gemini: yes for explicit context caching — discounted batch caching rates are published per model (e.g. gemini-3.5-flash cached input at $0.075 in batch vs $0.15 interactive); whether implicit caching applies inside batch jobs is undocumented.

Effective LLM cost per million input tokens: list versus cached versus batch versus stacked, June 2026
The same model, four prices: list, cached (×0.1), batch (×0.5), and stacked (×0.05) — effective input cost spans a 20× range before you've changed a single model choice.
95%
off list price for stacked batch + cached input where providers support combining the two discounts — $3.00 list becomes $0.15 effective on Claude Sonnet 4.6.

The Effective Cost Tables (June 2026)

All prices in USD per 1 million tokens. Cached input is the provider's published cache-hit read price (always 0.1× list here). Batch input = list × 0.5. Batch + cache input = list × 0.5 × 0.1 = list × 0.05, shown only where the provider supports stacking. Batch output = list output × 0.5.

OpenAI

Source: OpenAI API pricing. Cached input is 90% off on all GPT-5.x models, with no cache-write surcharge and 24-hour cache retention by default on gpt-5.5 and later. Batch + cache stacking applies to GPT-5+ models only; Flex processing is the recommended path for combining the two.

Model List input Cached input Batch input Batch + cache input List output Batch output
gpt-5.5 $5.00 $0.50 $2.50 $0.25 $30.00 $15.00
gpt-5.4 $2.50 $0.25 $1.25 $0.125 $15.00 $7.50
gpt-5.4-mini $0.75 $0.075 $0.375 $0.0375 $4.50 $2.25
gpt-5.4-nano $0.20 $0.02 $0.10 $0.01 $1.25 $0.625

gpt-5.5-pro ($30.00 input / $180.00 output) is excluded because it offers no cached input pricing at all — there is no effective-cost story to compute. Pre-GPT-5 models do not get cached pricing inside the Batch API, so their batch + cache column would be their plain batch price.

Anthropic

Source: Anthropic pricing. The only provider that states batch + cache stacking verbatim on its pricing page. Cached input below is the pure read price; writes cost 1.25× list (5-min TTL) or 2× (1-hour) — at 1 write per 20 reads, blend to ~0.155× list (see methodology). Note Opus 4.8 is $5/$25; the older $15/$75 applied only to the deprecated Opus 4.1/4.

Model List input Cached input Batch input Batch + cache input List output Batch output
Claude Fable 5 $10.00 $1.00 $5.00 $0.50 $50.00 $25.00
Claude Opus 4.8 $5.00 $0.50 $2.50 $0.25 $25.00 $12.50
Claude Sonnet 4.6 $3.00 $0.30 $1.50 $0.15 $15.00 $7.50
Claude Haiku 4.5 $1.00 $0.10 $0.50 $0.05 $5.00 $2.50

Cache pre-warming requests (max_tokens: 0) are rejected inside batches, so plan cache writes on the interactive path. Fast mode (the Opus 4.6–4.8 latency premium) stacks with caching but is not available with Batch.

Google Gemini

Source: Gemini API pricing. The batch + cache column reflects explicit context caching inside batch jobs, for which Google publishes discounted rates per model; explicit caching also bills storage at $1.00/MTok/hr (Flash) or $4.50/MTok/hr (Pro), not included in the per-token figures below. Implicit (automatic) caching inside batch jobs is undocumented.

Model List input Cached input Batch input Batch + cache input List output Batch output
gemini-3.1-pro-preview* $2.00 $0.20 $1.00 $0.10 $12.00 $6.00
gemini-3.5-flash $1.50 $0.15 $0.75 $0.075 $9.00 $4.50
gemini-3-flash-preview $0.50 $0.05 $0.25 $0.025 $3.00 $1.50
gemini-3.1-flash-lite $0.25 $0.025 $0.125 $0.0125 $1.50 $0.75
gemini-2.5-flash $0.30 $0.03 $0.15 $0.015 $2.50 $1.25

*gemini-3.1-pro-preview prices are the ≤200k-token-prompt tier; prompts over 200k tokens bill $4.00 input / $18.00 output ($0.40 cached) — double every computed cell for the long-context tier. Google's published batch rates ($1.00/$6.00 for 3.1 Pro, $0.75/$4.50 for 3.5 Flash, and so on) match the ×0.5 derivation exactly.

DeepSeek (caveat rows)

Source: DeepSeek pricing. DeepSeek caching is fully automatic — there's no cache_control to manage, and no batch-discount tier to stack, so this table has only three price columns. Cache-hit prices below are as of the April 26, 2026 price cut, which dropped hit pricing to roughly 1/10 of its prior level.

Model Input (cache miss) Input (cache hit) Output
deepseek-v4-flash $0.14 $0.0028 $0.28
deepseek-v4-pro $0.003625*

*The v4-pro cache-hit price appears to be promotional — secondary sources put the regular rate at ~$0.0145 — and DeepSeek's miss/output prices for v4-pro were not independently verified for this table, so those cells are left blank rather than estimated. The v4-flash cache hit at $0.0028 is a 98% discount off its $0.14 miss price — the deepest cache discount of any provider listed.

Early Access

What's Your Effective Cost?

LeanLM profiles your production traffic, measures your real cache hit rate and batchable share, and computes your actual effective cost per model — then validates cheaper configurations against your own workload before anything ships. Get early access.

You're on the list. We'll be in touch.

No spam, ever.

Best Value Picks, by Tier

Reading the stacked column (batch + cache input) across providers:

One trend worth flagging: gemini-3.5-flash lists at 5× the input price of gemini-2.5-flash ($1.50 vs $0.30) and 3.6× on output ($9.00 vs $2.50). The Flash tier is no longer reflexively cheap — which makes caching discipline matter far more on 3.5 Flash than it ever did on 2.5. A 3.5 Flash workload with no caching costs 5× more than the 2.5 Flash workload it replaced; the same workload with a high cache hit rate ($0.15 cached) lands back in the old neighborhood.

Honest Caveats

Price is not quality. This table tells you what each model costs, not which model is good enough for your task. The cheapest capable model wins — and "capable" is an empirical question per task type, which is exactly what LLM model routing solves: route each query to the cheapest model that meets your quality bar, and let the effective prices above set the savings ceiling.

Prices change — this page changes with them. OpenAI's cached-input discount went from 50% (gpt-4o era) to 90% on GPT-5.x. Gemini's implicit cache discount rose from 75% to 90%. DeepSeek cut cache-hit prices 10× in April 2026. Any pricing table without a verification date is already suspect; this one carries a changelog and gets re-verified monthly against the provider pages.

Tokenizers differ, so cross-provider $/MTok is not apples-to-apples. Anthropic's docs note the Opus 4.7+ tokenizer "may use up to 35% more tokens for the same fixed text." A $3.00/MTok model that tokenizes your corpus into 30% more tokens is effectively a $3.90/MTok model for your workload. When comparing across providers, multiply by your measured tokens-per-document ratio, not just the rate card.

The stacked column assumes both discounts apply to the same token. Real workloads blend: some traffic is interactive (no batch), some prompts miss the cache, and Anthropic cache writes cost extra. Use the methodology above to blend by your actual hit rate and async share. The mechanics of getting hit rates up — prompt structure, prefix stability, TTL management — are covered in our prompt caching deep-dive, and the operational side of the 50% tier in the LLM batch API guide. For where these two levers fit in the broader stack, see enterprise LLM cost optimization.

Early Access

Stop Paying List Price

Most teams capture less than half of the discounts on this page because cache hit rates and batch routing are nobody's job. LeanLM makes them measurable — and validates every change against your production quality bar. Join the waitlist.

You're on the list. We'll be in touch.

No spam, ever.

Frequently Asked Questions

What is the cheapest LLM API per million tokens in 2026?

On effective (not list) pricing as of June 2026: the cheapest cached input anywhere is DeepSeek v4-flash at $0.0028/MTok on cache hits. The cheapest batch + cache stacked input from a major provider is gpt-5.4-nano at $0.01/MTok ($0.20 list × 0.5 batch × 0.1 cache). The cheapest stacked flagship-class input is gemini-3.1-pro-preview at $0.10/MTok via explicit caching inside batch.

Can you combine LLM batch discounts with prompt caching?

It depends on the provider. Anthropic: yes — the pricing page states caching multipliers stack with the Batch API discount. OpenAI: yes for GPT-5+ models in the Batch API (pre-GPT-5 models don't get caching in Batch); Flex processing is OpenAI's recommended way to combine the 50% discount with full caching. Google Gemini: yes for explicit context caching — discounted batch caching rates are published per model; implicit caching inside batch jobs is undocumented.

How do you calculate effective LLM cost per million tokens?

Effective input price = list price × batch multiplier (0.5 if the workload is asynchronous) × cache-read multiplier (0.1 on cache hits, where the provider supports stacking). Example: Claude Sonnet 4.6 at $3.00 list becomes $3.00 × 0.5 × 0.1 = $0.15/MTok — 95% off list. For Anthropic, add cache-write amortization: at 1 write (1.25× input) per 20 reads, the blended cache multiplier is ~0.155× instead of 0.1×. Weight the result by your actual cache hit rate.

How accurate and up to date is this LLM pricing table?

Every list price was verified against the provider's own pricing page on June 10, 2026, and every computed cell is derived from those list prices using the stated multiplier methodology — nothing is recalled from memory or copied from third-party aggregators. The page carries a changelog and a monthly re-verification commitment. DeepSeek cache-hit prices reflect the April 26, 2026 price cut and are labeled accordingly.

Changelog