Agent runtime

Context Budget

The total number of tokens an AI agent has available for instructions, memory, retrieved context, conversation history, and tool results — and how that budget is allocated across them.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

A context budget is the explicit accounting of how many tokens go to which purpose inside a single model call: system instructions, persistent personality, user memory, retrieved documents, conversation history, tool definitions, and tool outputs. The total is bounded by the model's context window; the allocation is bounded by what the agent actually needs to succeed.

Why it matters

Token usage is cost and latency. Allocating 80% of the budget to a 50-page retrieved PDF and 20% to the actual user request means the agent pays for context it never reads. Allocating evenly means it might miss the document the question is actually about. Explicit budgeting forces the design decision and makes regressions detectable.

How it works

We declare per-slot budgets (system: N tokens, retrieval: M tokens, history: K tokens). Retrieval respects its budget by limiting top-k and chunk size. History respects its budget by compaction. Tool definitions respect their budget by registry filtering (only show the tools relevant to this step). The budget itself is part of the workflow config and goes through eval-gated change.

Related resources