Operations

LLM Cost

The total spend on language-model APIs across an organization — input tokens, output tokens, embeddings, fine-tuning — and the practice of attributing, optimizing, and budgeting it.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

LLM cost is the operational discipline of managing what your AI workloads spend on model providers. Per-request, per-tenant, per-workflow attribution; budget enforcement; cost-quality trade-off measurement; and the ongoing optimization that captures the cost reductions the model landscape keeps offering.

Why it matters

AI costs grow without supervision. A workflow that costs $0.10 per call at low volume becomes a six-figure line item at scale, often before anyone notices. Without attribution, costs cannot be optimized — and without optimization, the team is paying for capability the workload doesn't need.

How it works

Every model call is traced with its token counts and the provider's price-at-time-of-call. Costs roll up per tenant, per workflow, per route, and per environment. Self-Optimizing Agents proposes cheaper variants for the harness to score. Budgets are enforced at the gateway. Drift alerts catch cost regressions before they become incidents.

Related resources