ChatGPT Token Limits (GPT-5.4)
March 2026 brought a major leap for OpenAI. GPT-5.4, released on March 5, fuses reasoning, coding, and computer use into a single unified model with a 1M token context window and 128K max output tokens. This is a dramatic increase from the 128K context of GPT-4o and the 400K window of earlier GPT-5 variants.
At the $20/month ChatGPT Plus tier, users get access to GPT-5.4 with daily usage caps. The $200/month Pro tier offers unlimited access to the full reasoning model. For API users, pricing sits at $1.25 per million input tokens and $10 per million output tokens — matching Gemini 2.5 Pro token-for-token.
Practical capacity: 1M tokens translates to roughly 750,000 words or about 3,000 pages of text. That's enough to process an entire novel, a comprehensive codebase, or a year's worth of quarterly reports in a single conversation. The 128K output limit means GPT-5.4 can generate roughly 96,000 words per response — ample for most use cases.
Claude Token Limits (Opus 4.6 & Sonnet 4.6)
Anthropic's headline news in March 2026 is that Claude Opus 4.6's 1M token context window left beta and went GA (generally available). Previously, Claude's maximum practical context was 200K tokens. The jump to 1M puts it on equal footing with GPT-5.4 for raw context capacity.
Claude Sonnet 4.6 retains its 200K token context window — still the largest available at the $20/month consumer tier without needing Opus. For the Pro subscription, Claude's 200K-token Sonnet window handles full-length books, legal contracts, and annual reports without truncation. Claude has consistently been praised for maintaining higher coherence and reasoning quality throughout long contexts compared to competitors.
The trade-off: Claude Opus 4.6 commands premium API pricing at $15/$75 per million input/output tokens — roughly 12× more expensive than GPT-5.4 for input tokens. The value proposition is quality, not volume: Claude's Constitutional AI approach produces fewer hallucinations (~3% on some benchmarks), more consistent long-document reasoning, and a writing voice that consistently reads more naturally than competitors.
Gemini Token Limits (3.1 Pro & Flash-Lite)
Google holds the undisputed context window crown. Gemini 3.1 Pro offers a 10M token context window — ten times larger than any competitor's flagship. Even the budget tier, Gemini Flash-Lite, provides 1M tokens at an astonishing $0.075 per million input tokens — literally an order of magnitude cheaper than OpenAI and Anthropic's flagship models.
10M tokens is roughly 7.5 million words — about 30,000 pages. That's the equivalent of processing an entire legal discovery archive, a multi-year email corpus, or an hours-long video with its full transcript in a single context. For multimodal work (combining text with images, audio, and video), Gemini's context advantage is even more pronounced because media files consume tokens rapidly.
The trade-off: Response latency increases with very long contexts, and Gemini's reasoning depth on complex tasks has historically trailed GPT-5 and Claude. A 10M context window is only useful if the model can maintain reasoning coherence across that entire span — and independent tests show quality degradation with extremely long inputs. For most practical workflows, the effective window is smaller than the theoretical maximum.
Which Model Should You Choose?
Token limits are one dimension, but the right choice depends on what you're actually doing. Here's a practical guide: