Agent runtime

Compaction

Shrinking an AI agent's conversation history so the most relevant context stays in the model's window without exceeding the token budget — by summarizing, truncating, or selectively dropping turns.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

Compaction is what an agent does when its accumulated conversation, retrieved documents, and tool results threaten to exceed the model's context window. Instead of failing or dropping the oldest turns blindly, the agent rewrites or summarizes parts of the history to preserve what matters and discard what doesn't. The goal is to keep the agent useful across long tasks without overflowing the window.

Why it matters

Real agent workflows often span dozens of turns, multiple tool calls, and large retrieved documents. Without compaction, the agent either hits a hard token limit (and crashes) or burns budget on irrelevant history (and gets worse). Good compaction is the difference between an agent that can sustain a multi-hour task and one that loses the plot after twenty minutes.

How it works

Common strategies: summarize older turns into a running synopsis; keep recent turns verbatim; drop tool-result payloads after the agent has used them (tool result clearing); selectively retain by relevance score; rotate to a separate retrieval index for old content. Different strategies suit different workloads — research tasks need long historical recall; ops tasks favor recency.

Related resources