🛡️
Session Flagged

Your session has been flagged for unusual activity.

You can try our app by searching for MultipleChat AI on Google and clicking the multiplechat.ai link to try it free.
Quick verification

Please confirm you're human to continue.


FoundationsLLM Architecture Updated 2026

Context Window

The total amount of text — measured in tokens — that a model can consider in a single call, including the system prompt, conversation history, attachments and the reply being generated.

Self-attention in the Transformer (Vaswani et al., 2017) operates over a fixed span of tokens; that span is the context window. Everything the model can "see" at once — instructions, prior turns, retrieved documents and the text it is currently producing — must fit inside it.

Exceed the window and the oldest content falls out of view: the model does not error, it simply stops attending to what no longer fits. This is why long conversations lose earlier detail and why retrieval is used to feed only the most relevant passages back in.

References

Primary, peer-reviewed and archival sources for this definition.

Attention Is All You Need
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Ł., & Polosukhin, I. (2017). Advances in Neural Information Processing Systems 30 (NeurIPS 2017).

Dictionary & encyclopedic entries

Cite this entry

MultipleChat. "Context Window." MultipleChat AI & LLM Glossary, 2026. https://multiple.chat/ai-glossary/context-window

Related terms

See this in practice

Run the same prompt across ChatGPT, Claude, Gemini and Grok — grounded in your own sources, cross-checked against each other.

Try MultipleChat Free

Continue learning

See paid plans