Operations

Model Routing

How an AI system decides which model to call for each step — based on privacy, cost, latency, quality, and what happens when a provider goes down.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

Why route

One model is rarely optimal for every task in a workflow. Classification, retrieval reasoning, summarization, structured extraction, coding, and final answer generation have different cost-per-quality curves. A small model that is fast and cheap is often correct for classification and structured extraction; a frontier reasoning model earns its cost on the steps where reasoning quality changes the outcome.

Per-step routing rather than per-workflow
Frontier models for high-complexity reasoning
Private or local models for sensitive workloads
Fallbacks for provider outage or quality regression

Routing axes

Privacy (does this data leave a controlled boundary), cost (price per output token times expected length), latency (p95 budget for the calling workflow), quality (eval score on this task class), and failure mode (what happens when the call times out or returns garbage). A routing decision is the explicit trade-off across those axes.

Operational realities

Provider rate limits, model deprecations, region availability, and pricing changes all affect routing at runtime. The gateway absorbs those changes so the workflow does not. Every routed call carries the route decision in its trace so cost and quality can be attributed by route, not just by workflow.

What it works with

Lives in the AI Platform gateway. Reads routing requirements from each step of an Agent Workflow. Emits route attribution to Observability. Receives optimization proposals from Self-Optimizing Agents. Enforced by Governance policy (which routes are allowed for which data classifications). Tested by Workflow Evals (does a candidate route maintain quality on the eval set).

When you need it

Signals: workflows hardcoded to one model provider; AI costs that cannot be attributed per step or per tenant; a provider outage that took down an AI feature with no fallback; regulated data going to a public API because the workflow author was not sure how to route it elsewhere.

Related resources

Workflow Evals

The test suite your AI workflows have to pass before any change reaches users — measuring quality, latency, cost, and safety on real production data instead of vibes.

Governance

The policy layer for what an AI system is allowed to read, call, decide, and ship — encoded as configuration the runtime enforces, not as a document on a shared drive.

Agent Observability

Trace-level visibility into every model call, retrieval, tool invocation, decision, approval, and failure inside an AI workflow — the substrate every other discipline (evals, optimization, governance) reads from.

Model Routing Policy

A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.

Private Inference

A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.

Model Fallback Strategy

A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.