Learn

Model Routing

How an AI system decides which model to call for each step — based on privacy, cost, latency, quality, and what happens when a provider goes down.

Why route

One model is rarely optimal for every task in a workflow. Classification, retrieval reasoning, summarization, structured extraction, coding, and final answer generation have different cost-per-quality curves. A small model that is fast and cheap is often correct for classification and structured extraction; a frontier reasoning model earns its cost on the steps where reasoning quality changes the outcome.

  • Per-step routing rather than per-workflow
  • Frontier models for high-complexity reasoning
  • Private or local models for sensitive workloads
  • Fallbacks for provider outage or quality regression

Routing axes

Privacy (does this data leave a controlled boundary), cost (price per output token times expected length), latency (p95 budget for the calling workflow), quality (eval score on this task class), and failure mode (what happens when the call times out or returns garbage). A routing decision is the explicit trade-off across those axes.

Operational realities

Provider rate limits, model deprecations, region availability, and pricing changes all affect routing at runtime. The gateway absorbs those changes so the workflow does not. Every routed call carries the route decision in its trace so cost and quality can be attributed by route, not just by workflow.

What it works with

Lives in the AI Platform gateway. Reads routing requirements from each step of an Agent Workflow. Emits route attribution to Observability. Receives optimization proposals from Self-Optimizing Agents. Enforced by Governance policy (which routes are allowed for which data classifications). Tested by Workflow Evals (does a candidate route maintain quality on the eval set).

When you need it

Signals: workflows hardcoded to one model provider; AI costs that cannot be attributed per step or per tenant; a provider outage that took down an AI feature with no fallback; regulated data going to a public API because the workflow author was not sure how to route it elsewhere.