Model Routing
How an AI system decides which model to call for each step — based on privacy, cost, latency, quality, and what happens when a provider goes down.
Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.
Why route
One model is rarely optimal for every task in a workflow. Classification, retrieval reasoning, summarization, structured extraction, coding, and final answer generation have different cost-per-quality curves. A small model that is fast and cheap is often correct for classification and structured extraction; a frontier reasoning model earns its cost on the steps where reasoning quality changes the outcome.
- Per-step routing rather than per-workflow
- Frontier models for high-complexity reasoning
- Private or local models for sensitive workloads
- Fallbacks for provider outage or quality regression
Routing axes
Privacy (does this data leave a controlled boundary), cost (price per output token times expected length), latency (p95 budget for the calling workflow), quality (eval score on this task class), and failure mode (what happens when the call times out or returns garbage). A routing decision is the explicit trade-off across those axes.
Operational realities
Provider rate limits, model deprecations, region availability, and pricing changes all affect routing at runtime. The gateway absorbs those changes so the workflow does not. Every routed call carries the route decision in its trace so cost and quality can be attributed by route, not just by workflow.
What it works with
Lives in the AI Platform gateway. Reads routing requirements from each step of an Agent Workflow. Emits route attribution to Observability. Receives optimization proposals from Self-Optimizing Agents. Enforced by Governance policy (which routes are allowed for which data classifications). Tested by Workflow Evals (does a candidate route maintain quality on the eval set).
When you need it
Signals: workflows hardcoded to one model provider; AI costs that cannot be attributed per step or per tenant; a provider outage that took down an AI feature with no fallback; regulated data going to a public API because the workflow author was not sure how to route it elsewhere.
Related resources
The test suite your AI workflows have to pass before any change reaches users — measuring quality, latency, cost, and safety on real production data instead of vibes.
The policy layer for what an AI system is allowed to read, call, decide, and ship — encoded as configuration the runtime enforces, not as a document on a shared drive.
Trace-level visibility into every model call, retrieval, tool invocation, decision, approval, and failure inside an AI workflow — the substrate every other discipline (evals, optimization, governance) reads from.
A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.
A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.
A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.