Compaction
Shrinking an AI agent's conversation history so the most relevant context stays in the model's window without exceeding the token budget — by summarizing, truncating, or selectively dropping turns.
Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.
What it is
Compaction is what an agent does when its accumulated conversation, retrieved documents, and tool results threaten to exceed the model's context window. Instead of failing or dropping the oldest turns blindly, the agent rewrites or summarizes parts of the history to preserve what matters and discard what doesn't. The goal is to keep the agent useful across long tasks without overflowing the window.
Why it matters
Real agent workflows often span dozens of turns, multiple tool calls, and large retrieved documents. Without compaction, the agent either hits a hard token limit (and crashes) or burns budget on irrelevant history (and gets worse). Good compaction is the difference between an agent that can sustain a multi-hour task and one that loses the plot after twenty minutes.
How it works
Common strategies: summarize older turns into a running synopsis; keep recent turns verbatim; drop tool-result payloads after the agent has used them (tool result clearing); selectively retain by relevance score; rotate to a separate retrieval index for old content. Different strategies suit different workloads — research tasks need long historical recall; ops tasks favor recency.
Related resources
The total number of tokens an AI agent has available for instructions, memory, retrieved context, conversation history, and tool results — and how that budget is allocated across them.
The discipline of deciding what an AI model sees on every call — instructions, retrieved data, memory, tool definitions, examples — and how to assemble them reliably as the workflow grows.
Memory that lives outside the model's context window — in a database, a vector store, or a structured memory store — and is retrieved on demand instead of carried in every call.
Dropping a tool's response from the agent's context window after the agent has used it — to keep the context lean across long multi-tool workflows.