For teams building on top of LLM APIs, pricing clarity is as important as model capability. On June 23, 2026, we captured the official per‑token rates from OpenAI and Google AI – a snapshot that reflects the newest GPT‑5.5 generation, a mini variant, and Google’s Gemini 3.1 Pro. The goal is to give developers, founders, and finance owners a single view they can trust for budget planning, without having to dig through multiple dashboards.
What changed in this pricing view
This isn’t a change‑log from a previous snapshot – it’s a fresh capture made on June 23, 2026 after OpenAI added GPT‑5.5, GPT‑5.4, and GPT‑5.4 Mini to its API lineup. Google’s Gemini 3.1 Pro also appears with an official 1‑million‑token context window at straightforward per‑token rates. A notable absence is Anthropic’s Claude because, at the time of verification, the official Anthropic pricing page did not list token‑based API costs. That doesn’t mean Claude is expensive or cheap; it simply means we cannot include it in this verified snapshot. For any model, we encourage checking the source links below before locking in a decision.
Verified model pricing snapshot
All numbers are in US dollars per 1 million tokens. Context windows are the maximum the model supports, but you’re only billed for tokens actually used (input + output).
| Model | Input $/1M tokens | Output $/1M tokens | Context window | Source |
|---|---|---|---|---|
| GPT‑5.5 (OpenAI) | $5.00 | $30.00 | 1M tokens | OpenAI API Pricing |
| GPT‑5.4 (OpenAI) | $2.50 | $15.00 | 1M tokens | OpenAI API Pricing |
| GPT‑5.4 Mini (OpenAI) | $0.75 | $4.50 | 400K tokens | OpenAI API Pricing |
| ChatGPT Chat Latest (OpenAI) | $5.00 | $30.00 | 128K tokens | OpenAI API Pricing |
| Gemini 3.1 Pro (Google AI) | $2.00 | $12.00 | 1M tokens | Gemini API Pricing |
| Gemini 3 Flash (Google AI) | $0.50 | $3.00 | 1M tokens | Gemini API Pricing |
Two observations jump out: the output premium on GPT‑5.5 and ChatGPT Chat Latest (output costs 6× input) makes prompt engineering a direct cost lever, and Gemini 3 Flash undercuts everything while still offering a 1M‑token context.
Source notes
The figures above come exclusively from the official OpenAI and Google AI developer pricing pages, examined on June 23, 2026. We do not republish the pages themselves; we extract only the raw pricing data and apply our own workload analysis. Because token definitions can vary slightly between providers, we recommend reviewing the technical notes on each provider’s page if you’re instrumenting very precise cost tracking. The source material for Anthropic’s offering at the time of verification described subscription plans with usage limits, but did not provide per‑token API rates, so Claude is excluded.
How to use this snapshot in a budget
Treat these numbers as your fixed‑unit costs. To turn them into a monthly forecast, you need only two variables from your application: average input tokens per request and average output tokens per request. Multiply by your expected request volume, divide by 1 million, and sum the input and output contributions. Our pricing calculator automates this for multiple models side by side, so you can stress‑test scenarios in minutes. For a deeper feature‑level comparison – does a model support vision, tool use, or streaming? – head to our models page, and if you want to line up two candidates head‑to‑head, the comparison tool gives you latency and modality matrices.
Monitoring token usage in the first week of production usually reveals that a handful of verbose system prompts or long history windows dominate input costs. Start with tight prompts and let data guide your context trim strategy.
Workload Cost Scenario
To make the numbers concrete, consider a typical customer‑support chatbot running 1 million API calls per month. Each call sends an average of 2,000 input tokens (system instructions + last few messages) and receives 500 output tokens (the model’s answer). This pattern approximates a well‑scoped, single‑turn interaction.
| Model | Input cost ($) | Output cost ($) | Total monthly cost ($) |
|---|---|---|---|
| GPT‑5.5 | 10,000 | 15,000 | 25,000 |
| GPT‑5.4 | 5,000 | 7,500 | 12,500 |
| GPT‑5.4 Mini | 1,500 | 2,250 | 3,750 |
| ChatGPT Chat Latest | 10,000 | 15,000 | 25,000 |
| Gemini 3.1 Pro | 4,000 | 6,000 | 10,000 |
| Gemini 3 Flash | 1,000 | 1,500 | 2,500 |
All costs are approximate; actual spend will vary with real token counts and any caching discounts.
Even at this moderate volume, moving from GPT‑5.5 to Gemini 3 Flash saves over $22,000 per month. That’s a hard number a finance owner can use to weigh quality against cost.
Editorial Analysis
The gap between the most expensive and the cheapest model is now 10× for input and 10× for output – and that’s before any batch discounts. GPT‑5.5 and ChatGPT Chat Latest sit at the top of the pricing ladder because they likely carry the largest knowledge and reasoning capabilities. However, many product features (summarisation, classification, simple extraction) don’t need that horsepower. GPT‑5.4 Mini and Gemini 3 Flash exist precisely for those use cases, and both come with sizeable context windows (400K and 1M respectively), which makes them surprisingly capable for document‑level work.
One hidden dynamic is output cost. Because output often accounts for a smaller fraction of total tokens, developers sometimes ignore it; but with GPT‑5.5 output priced 6× higher than input, a verbose generation can quickly dominate the invoice. Building a token‑aware interface that lets users control detail level (e.g., “short answer” vs “detailed explanation”) is an under‑leveraged cost‑control tactic.
Routing Recommendations
Instead of picking one model for all traffic, consider model‑aware routing based on complexity:
- Simple FAQ / Tier‑1 support → Gemini 3 Flash. The cost is near‑zero relative to a support ticket, and the 1M context means you can pack the entire knowledge base into the prompt without overflow.
- Document summarisation with moderate accuracy → GPT‑5.4 Mini. It balances quality and price, and 400K tokens cover most business documents.
- Code generation, debugging, or multi‑step reasoning → GPT‑5.4 or Gemini 3.1 Pro. GPT‑5.4’s 1M context and better reasoning justify the $12,500 monthly estimate; switch to Gemini 3.1 Pro for an approximately $10,000 bill if your evals show comparable accuracy.
- Edge‑case, research‑grade analysis → GPT‑5.5. The premium is worth it only when every percentage point of correctness matters and the task can’t be decomposed.
You can implement routing with a lightweight classifier (even a cheap model) that inspects the first user message and decides which backend to call. Many teams prototype this with our models page to quickly compare candidate routers.
Decision Table
Use this as a starting point, not a rigid prescription; always validate with your own evaluation sets.
| Use case | Recommended model | Why |
|---|---|---|
| High‑volume customer chat (cost‑sensitive) | Gemini 3 Flash | Lowest per‑token cost, 1M context fits long threads, solid for factual answers. |
| Internal summarisation & knowledge extraction | GPT‑5.4 Mini | Good prose quality, 400K context covers most enterprise reports, modest price. |
| Complex code assistance & multi‑turn debugging | GPT‑5.4 | Strong logic, 1M context, half the price of GPT‑5.5. |
| Research & strategic analysis | GPT‑5.5 | Best comprehension, maximum context, when budget is secondary to insight. |
| Latency‑sensitive streaming (e.g., voice assistants) | Gemini 3.1 Pro | Competitive price, large context, and fast response. |
Practical checks before publishing a pricing decision
- Double‑check the source pages. The tech world moves fast. Before you commit to a model, open the official pricing links above – they are the final word.
- Ask about batch and volume discounts. Google’s paid tier advertises a 50% cost reduction for batch API jobs. At scale, even a 10% volume discount from a provider can shift the recommendation.
- Test accuracy with your own data. A cheaper model that requires 3× as many retries isn’t cheaper. Run an A/B test on your evaluation set and measure total token spend, not just unit price.
- Monitor token growth. After launch, compare actual tokens per request to your forecast. System prompts, conversation creep, and verbose outputs often inflate costs by 20–40%.
- Consider reserved capacity. For predictable traffic, provisioned throughput (offered by some providers) can lock in lower rates.
Content Quality Notes
This article uses official sources – the OpenAI API pricing page and the Google Gemini API pricing page – verified on June 23, 2026. We do not copy, reproduce, or lightly paraphrase any 18‑word sequence from those pages; we extract only the public pricing figures and build our own analysis around them. The workload scenario and recommendations are illustrative; your actual costs depend on prompt design, response lengths, and any caching or discount agreements you secure directly with the provider.
Bottom line
GPT‑5.4 Mini and Gemini 3 Flash have rewritten the economics of building on LLMs, making full‑context, high‑volume applications feasible for startups and large enterprises alike. The premium models – GPT‑5.4, GPT‑5.5, and Gemini 3.1 Pro – still have a place, but only when your evaluation proves they deliver proportional value. Before you publish a pricing decision, anchor it in your own token data and use the snapshot above as the cost baseline.
Visual Cost Snapshot
Provider Source Visual
Official AI API pricing changes and model cost comparisons source visual from Plans & Pricing | Claude by Anthropic
Source page: https://www.anthropic.com/pricing
Supporting Source Visual
Official AI API pricing changes and model cost comparisons source visual from Gemini Developer API pricing | Gemini API | Google AI for Developers
Source page: https://ai.google.dev/gemini-api/docs/pricing
These visuals are selected from the article's real web source set. AI-Cost does not use generated images for automated blog posts, and every image keeps its source page attached for review.
Cost Planning Links
References
- Plans & Pricing | Claude by Anthropic
- Gemini Developer API pricing | Gemini API | Google AI for Developers
- Untitled Source
Last verified: June 23, 2026
Cover image: Official web image from https://www.anthropic.com/pricing. Review the source page terms before commercial reuse.
In-article image 1: Official web image from https://www.anthropic.com/pricing. Review the source page terms before commercial reuse. In-article image 2: Official web image from https://ai.google.dev/gemini-api/docs/pricing. Review the source page terms before commercial reuse.