Operations

Model Gateway

The component of an AI platform that routes every model call — choosing the provider, applying rate limits and fallback, attributing cost, and emitting traces — so applications never call providers directly.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

A model gateway is the inbound API for every AI model an organization uses. Applications and agents talk to the gateway; the gateway decides which provider and model to actually invoke. It's the chokepoint that makes provider choice a routing rule instead of a hardcoded dependency.

Why it matters

Without a gateway, switching from one provider to another is a code change in every application. With a gateway, it's a routing rule. The gateway is also where cost attribution, rate-limit handling, fallback logic, and trace emission live — none of which any individual application should be reinventing.

How it works

Standard pattern: receive request, resolve routing policy (privacy, cost, latency, quality), call the selected provider, handle errors and fallback, attribute cost to tenant and workflow, emit OpenTelemetry trace. See AI Gateway for the broader picture; Model Gateway is specifically the model-call leg.

Related resources

AI Gateway

A single integration point in front of every AI model your applications use — for routing, key rotation, rate limits, fallback, cost attribution, and observability.

Model Routing

How an AI system decides which model to call for each step — based on privacy, cost, latency, quality, and what happens when a provider goes down.

Ai Platform

A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.

LLM Cost

The total spend on language-model APIs across an organization — input tokens, output tokens, embeddings, fine-tuning — and the practice of attributing, optimizing, and budgeting it.

Model Gateway

What it is

Why it matters

How it works

Related concepts

Related resources