Context Budget
The total number of tokens an AI agent has available for instructions, memory, retrieved context, conversation history, and tool results — and how that budget is allocated across them.
Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.
What it is
A context budget is the explicit accounting of how many tokens go to which purpose inside a single model call: system instructions, persistent personality, user memory, retrieved documents, conversation history, tool definitions, and tool outputs. The total is bounded by the model's context window; the allocation is bounded by what the agent actually needs to succeed.
Why it matters
Token usage is cost and latency. Allocating 80% of the budget to a 50-page retrieved PDF and 20% to the actual user request means the agent pays for context it never reads. Allocating evenly means it might miss the document the question is actually about. Explicit budgeting forces the design decision and makes regressions detectable.
How it works
We declare per-slot budgets (system: N tokens, retrieval: M tokens, history: K tokens). Retrieval respects its budget by limiting top-k and chunk size. History respects its budget by compaction. Tool definitions respect their budget by registry filtering (only show the tools relevant to this step). The budget itself is part of the workflow config and goes through eval-gated change.
Related resources
The discipline of deciding what an AI model sees on every call — instructions, retrieved data, memory, tool definitions, examples — and how to assemble them reliably as the workflow grows.
Shrinking an AI agent's conversation history so the most relevant context stays in the model's window without exceeding the token budget — by summarizing, truncating, or selectively dropping turns.
Memory that lives outside the model's context window — in a database, a vector store, or a structured memory store — and is retrieved on demand instead of carried in every call.
The total spend on language-model APIs across an organization — input tokens, output tokens, embeddings, fine-tuning — and the practice of attributing, optimizing, and budgeting it.