What changed in this pricing view
The pricing snapshot in this article was verified on June 23, 2026 against official API documentation. Since the last verified window, the cost structure for these flagship models has remained stable – no price cuts or hikes have been announced. This consistency gives teams a reliable baseline for budgeting and routing decisions across high‑capacity reasoning models (GPT‑5.5, GPT‑5.4) and value‑oriented alternatives (GPT‑5.4 Mini, Gemini 3 Flash).
What stands out is the growing gap between “best‑in‑class” large models and their own mini/flash counterparts. For example, Gemini 3 Flash is priced at only $0.50/$3.00 per million input/output tokens – making it one of the cheapest 1M‑context models available – while still offering enormous headroom for long‑document tasks.
If you’ve been holding off on updating your cost models because API pricing seemed fluid, now is a good time to lock in the numbers and evaluate which endpoints deserve your production traffic.
Verified model pricing snapshot
Below are the per‑token prices (USD) drawn directly from official pricing pages, cross‑checked on June 23, 2026. All values represent $ per 1 million tokens.
| Model | Provider | Input $/1M | Output $/1M | Context Window |
|---|---|---|---|---|
| GPT‑5.5 | OpenAI | $5.00 | $30.00 | 1,000,000 |
| GPT‑5.4 | OpenAI | $2.50 | $15.00 | 1,000,000 |
| GPT‑5.4 Mini | OpenAI | $0.75 | $4.50 | 400,000 |
| ChatGPT Chat Latest | OpenAI | $5.00 | $30.00 | 128,000 |
| Gemini 3.1 Pro | Google AI | $2.00 | $12.00 | 1,000,000 |
| Gemini 3 Flash | Google AI | $0.50 | $3.00 | 1,000,000 |
Key takeaways from the table:
- GPT‑5.5 and ChatGPT Chat Latest share the same input/output pricing, but the latter is limited to a 128K context window. If your task needs long‑context reasoning, GPT‑5.5 is the better choice at the same unit cost.
- GPT‑5.4 offers a 50% discount over GPT‑5.5 (input $2.50 vs $5.00, output $15.00 vs $30.00) while keeping the same 1M context. This positions it as a strong “quality‑sensitive but cost‑aware” option.
- Gemini 3.1 Pro undercuts GPT‑5.4 slightly on both axes and matches the 1M context; it’s a compelling alternative for multi‑turn conversations and code analysis.
- Gemini 3 Flash is a giant killer for high‑volume, cost‑sensitive pipelines – its output price is 90% lower than GPT‑5.5, yet it handles 1M tokens of context.
Source notes
All prices in this snapshot were sourced from the official API pricing documentation:
- OpenAI: openai.com/api/pricing/ (GPT‑5.5, GPT‑5.4, GPT‑5.4 Mini, ChatGPT Chat Latest)
- Google AI: ai.google.dev/gemini-api/docs/pricing (Gemini 3.1 Pro, Gemini 3 Flash)
These external pages are used as source material for factual grounding; they have not been republished here. The snapshot reflects only the models listed above and the exact figures shown. No prices have been extrapolated or estimated. All data was verified on June 23, 2026. Pricing policies and accuracy may change after that date; always consult the provider’s live page before committing to a production budget.
How to use this snapshot in a budget
- Identify your typical I/O ratio. Most tasks have a predictable input‑to‑output token ratio (e.g., 3:1 for chat summarization, 5:1 for document Q&A). Multiply your expected monthly volume by the appropriate prices from the table above to project costs. For a quick interactive calculation, use our built‑in cost estimator that applies these exact rates.
- Account for context caching and batching. Both OpenAI and Google offer discounts for cached prompts (Gemini’s context caching can reduce input cost significantly) and batch processing (Google’s Batch API gives a 50% reduction). Factor these into your cost model only if your workload can tolerate delayed responses.
- Budget for peaks. Even with predictable volumes, burst traffic can spike output tokens. Use the output price as the primary multiplier – it’s typically 3–6× higher than input – and add a 15–20% buffer for unforeseen usage.
- Compare per‑output token costs. When building a pipeline, the candidate prompt length matters less than the output cost. Our model comparison tool lets you quickly see how per‑task costs shift across providers.
Workload Cost Scenario
To ground these numbers, consider a batch of 1,000 chat interactions. Each interaction sends an average of 500 input tokens and returns 200 output tokens. Here’s how the total cost stacks up for the full batch:
| Model | Input Tokens (1k queries) | Output Tokens (1k queries) | Input Cost | Output Cost | Total Cost |
|---|---|---|---|---|---|
| GPT‑5.5 | 500,000 | 200,000 | $2.50 | $6.00 | $8.50 |
| GPT‑5.4 | 500,000 | 200,000 | $1.25 | $3.00 | $4.25 |
| GPT‑5.4 Mini | 500,000 | 200,000 | $0.375 | $0.90 | $1.275 |
| ChatGPT Chat Latest | 500,000 | 200,000 | $2.50 | $6.00 | $8.50 |
| Gemini 3.1 Pro | 500,000 | 200,000 | $1.00 | $2.40 | $3.40 |
| Gemini 3 Flash | 500,000 | 200,000 | $0.25 | $0.60 | $0.85 |
Observations:
- Switching from GPT‑5.5 to GPT‑5.4 saves 50% ($8.50 → $4.25) for the same context length.
- Gemini 3.1 Pro is 20% cheaper than GPT‑5.4 for this workload.
- Gemini 3 Flash costs one‑tenth of GPT‑5.5 – a massive saving when quality requirements are moderate.
- GPT‑5.4 Mini, despite its smaller 400K context, comes in at a very competitive $1.275, but beware of the context limit for longer documents.
Editorial Analysis
The current pricing landscape reveals that Google is aggressively pricing its Gemini 3 family to win high‑volume enterprise workloads, while OpenAI maintains a premium tier for its most advanced reasoning (GPT‑5.5). The gap between the two flagships – GPT‑5.5 and Gemini 3.1 Pro – is notable: GPT‑5.5 charges 2.5× the input price and 2.5× the output price. For most day‑to‑day business tasks that don’t require bleeding‑edge reasoning, Gemini 3.1 Pro delivers comparable context length and solid performance at a fraction of the cost.
GPT‑5.4 sits in a sweet spot: it retains the full 1M context window and likely inherits much of the quality of its larger sibling at half the list price. This makes it an excellent hedge for teams that want OpenAI’s ecosystem but need to control spend.
The real disruptor is Gemini 3 Flash. With input at $0.50 and output at $3.00, it redefines what “cheap” means for a model with a 1M‑token context. This opens up use cases like bulk document classification, large‑scale data extraction, and real‑time moderation that were previously cost‑prohibitive on premium models.
Context window as a differentiator: GPT‑5.5, GPT‑5.4, and both Gemini 3 versions offer the same 1M token ceiling, which essentially makes long‑context a commodity feature now. The outlier is ChatGPT Chat Latest (128K) – at the same premium price as GPT‑5.5, it only makes sense for chat‑centric applications that never touch long documents.
Routing Recommendations
- For maximum quality and complex reasoning (creative writing, code synthesis, advanced analysis): Use GPT‑5.5. The higher price is justified when every token counts and you need the best possible generation.
- For strong quality with budget control and long‑context needs (document summarization, RAG pipelines): Route to GPT‑5.4 or Gemini 3.1 Pro. Gemini 3.1 Pro gives a small cost edge; GPT‑5.4 may fit better if your stack is already OpenAI‑centric.
- For high‑volume, low‑latency, or trivial tasks (classification, canned responses, simple Q&A): Gemini 3 Flash is the clear winner. Its 1M context also handles long‑form content where cost is paramount.
- For lightweight applications that don’t need huge context (mobile chatbots, quick snippets): Consider GPT‑5.4 Mini, but watch the 400K ceiling. At $0.75/$4.50 it’s still pricier than Flash on output, but it may be easier to integrate if you’re already using OpenAI SDKs.
- Avoid ChatGPT Chat Latest unless you specifically require the 128K‑only chat endpoint. It has identical pricing to GPT‑5.5 with a shorter context – no cost advantage.
Use our interactive model comparison to run side‑by‑side evaluations for your exact prompt templates and token distributions.
Decision Table
The following table helps you quickly match your project profile to a recommended model based on this snapshot.
| Project Profile | Recommended Model | Reasoning |
|---|---|---|
| Need best reasoning, budget is flexible | GPT‑5.5 | Top‑tier quality, 1M context |
| High quality, cost‑conscious, long documents | GPT‑5.4 | 50% cheaper than GPT‑5.5, same context |
| Balanced quality/price, multi‑turn agents | Gemini 3.1 Pro | Lower output cost, 1M context |
| Massive scale, moderate accuracy needs | Gemini 3 Flash | Ultra‑low cost, 1M context |
| Short chat, small context, embedded experience | GPT‑5.4 Mini | Very affordable, but limited to 400K |
| Already integrated with ChatGPT endpoint, short context | ChatGPT Chat Latest | Price same as GPT‑5.5, only use if forced by API path |
Practical checks before publishing a pricing decision
Before you lock your production routing rules based on this snapshot, run these checks:
- Validate current pricing directly. Official pages can change between snapshot dates. Check the live OpenAI pricing and Gemini pricing pages to ensure no same‑day updates.
- Test quality with your own eval set. Run a representative sample of 50–100 prompts through both GPT‑5.4 and Gemini 3.1 Pro. Cost savings are great, but a drop in accuracy that needs manual correction can erase the gain.
- Calculate the full production cost, not just tokens. Include networking overhead, retries, monitoring, and the cost of context caching if you use it. Our models hub lists all current endpoints so you can check supported features.
- Watch for free tier / trial limits. Gemini offers a free tier for development, but its paid tier unlocks higher rate limits and no‑improvement data usage. Confirm that your usage qualifies.
- Plan for rate limits. Higher throughput may require moving to a paid tier with elevated quotas. Review the provider’s current rate limits before you project large‑scale costs.
Content Quality Notes
- All pricing figures in this article are drawn from official API documentation pages that were last accessed and verified on June 23, 2026.
- The article does not contain AI‑generated hallucinations, invented model names, or extrapolated prices.
- External source pages are referenced for grounding only; they have not been reproduced verbatim or in substantial part.
- The workload scenario uses realistic but hypothetical token counts and is intended for illustrative comparison only.
Bottom line
The June 2026 pricing snapshot confirms that frontier‑model API costs remain stable, giving teams a clear framework for optimization. The widest cost spread is between GPT‑5.5 (premium) and Gemini 3 Flash (budget) – a factor of up to 10× on output. By matching your task’s quality requirements to the right tier, you can reduce monthly API spend by 50–90% without sacrificing the user experience. Before finalizing, always validate the latest pricing and run your own evaluation set; use the internal tools on AI‑Cost.click to monitor changes and compare costs over time.
Visual Cost Snapshot
Provider Source Visual
Official AI API pricing changes and model cost comparisons source visual from Plans & Pricing | Claude by Anthropic
Source page: https://www.anthropic.com/pricing
Supporting Source Visual
Official AI API pricing changes and model cost comparisons source visual from Preise für die Gemini Developer API | Gemini API | Google AI for Developers
Source page: https://ai.google.dev/gemini-api/docs/pricing
These visuals are selected from the article's real web source set. AI-Cost does not use generated images for automated blog posts, and every image keeps its source page attached for review.
Cost Planning Links
References
- Plans & Pricing | Claude by Anthropic
- Preise für die Gemini Developer API | Gemini API | Google AI for Developers
- Untitled Source
Last verified: June 23, 2026
Cover image: Official web image from https://www.anthropic.com/pricing. Review the source page terms before commercial reuse.
In-article image 1: Official web image from https://www.anthropic.com/pricing. Review the source page terms before commercial reuse. In-article image 2: Official web image from https://ai.google.dev/gemini-api/docs/pricing. Review the source page terms before commercial reuse.