AI & Automation
What is Context Window?
Definition
The maximum amount of text an LLM can process in a single interaction — encompassing the full conversation history, system instructions, and documents provided.
In more detail
The context window is the LLM's working memory for a single interaction. Everything the model can 'see' at once — the system prompt, the conversation history, any documents you've passed in, and the response it's generating — must fit within this limit. It's measured in tokens, where one token is roughly three-quarters of a word in English.
Current context window sizes vary significantly: GPT-4o supports 128,000 tokens (~96,000 words), Claude 3.5 Sonnet supports 200,000 tokens (~150,000 words). These numbers sound large, but they fill up quickly in production applications — a long document, a multi-turn conversation, and a detailed system prompt can easily consume tens of thousands of tokens per request.
Context window size directly affects cost and latency, since most LLM APIs charge per token processed. Well-architected AI systems use strategies like RAG (retrieving only relevant chunks rather than loading whole document libraries), conversation summarisation (compressing history rather than keeping every message), and chunking to work efficiently within context limits.
Why it matters
Context window limitations shape how AI applications must be architected. Misunderstanding them leads to unexpected costs, degraded model performance on long documents, or systems that break once conversations get long enough.
Related service
Working with Context?
I offer AI Integration & Agentic Workflows for businesses ready to move from understanding to implementation.
Learn about AI Integration & Agentic Workflows →