AI Gateway
A single integration point in front of every AI model your applications use — for routing, key rotation, rate limits, fallback, cost attribution, and observability.
Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.
What it is
An AI gateway is a service that sits between your applications and the AI model providers (Anthropic, OpenAI, Google, AWS Bedrock, Azure, Mistral, your own private deployments). Applications send their request to the gateway; the gateway decides which provider and model to actually call, handles the response, and emits a trace. From the application's perspective, switching from one provider to another is a config change, not a code rewrite.
Why it matters
Without a gateway, every team writes its own provider integration, manages its own API keys, handles its own rate limits, and has no visibility into total cost. A gateway centralizes those decisions and makes the AI provider choice a routing rule the platform team controls.
How it works
Standard pattern: route by policy (privacy class, cost budget, latency target, quality requirement), fail over on outage or rate limit, attribute cost per tenant or per workflow, log every request as a trace, enforce safety classifiers before responses leave. Open-source gateways (Portkey, LiteLLM, Helicone) and managed offerings (Vercel AI Gateway, AWS Bedrock, Cloudflare AI Gateway) all implement variations of this.
Related resources
How an AI system decides which model to call for each step — based on privacy, cost, latency, quality, and what happens when a provider goes down.
The component of an AI platform that routes every model call — choosing the provider, applying rate limits and fallback, attributing cost, and emitting traces — so applications never call providers directly.
A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.
The total spend on language-model APIs across an organization — input tokens, output tokens, embeddings, fine-tuning — and the practice of attributing, optimizing, and budgeting it.