Agent runtime

Compaction

Shrinking an AI agent's conversation history so the most relevant context stays in the model's window without exceeding the token budget — by summarizing, truncating, or selectively dropping turns.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

Compaction is what an agent does when its accumulated conversation, retrieved documents, and tool results threaten to exceed the model's context window. Instead of failing or dropping the oldest turns blindly, the agent rewrites or summarizes parts of the history to preserve what matters and discard what doesn't. The goal is to keep the agent useful across long tasks without overflowing the window.

Why it matters

Real agent workflows often span dozens of turns, multiple tool calls, and large retrieved documents. Without compaction, the agent either hits a hard token limit (and crashes) or burns budget on irrelevant history (and gets worse). Good compaction is the difference between an agent that can sustain a multi-hour task and one that loses the plot after twenty minutes.

How it works

Common strategies: summarize older turns into a running synopsis; keep recent turns verbatim; drop tool-result payloads after the agent has used them (tool result clearing); selectively retain by relevance score; rotate to a separate retrieval index for old content. Different strategies suit different workloads — research tasks need long historical recall; ops tasks favor recency.

Related resources

Context Budget

The total number of tokens an AI agent has available for instructions, memory, retrieved context, conversation history, and tool results — and how that budget is allocated across them.

Context Engineering

The discipline of deciding what an AI model sees on every call — instructions, retrieved data, memory, tool definitions, examples — and how to assemble them reliably as the workflow grows.

External Memory

Memory that lives outside the model's context window — in a database, a vector store, or a structured memory store — and is retrieved on demand instead of carried in every call.

Tool Result Clearing

Dropping a tool's response from the agent's context window after the agent has used it — to keep the context lean across long multi-tool workflows.

Compaction

What it is

Why it matters

How it works

Related concepts

Related resources