Agent runtime

Responses API

OpenAI's API for stateful, multi-turn agent interactions — a successor to the Assistants API designed for production agent workloads with built-in tool use and state management.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

The Responses API is OpenAI's agent-oriented API surface, introduced in 2025 as the successor to the Assistants API. It bundles model calls, tool use, state management, and streaming into one interface aimed at production agent workloads rather than ad-hoc chat. It's one of several competing 'agent platform' offerings from major providers.

Why it matters

Building production agents requires plumbing — state, streaming, tool dispatch, retries — that every team would otherwise build themselves. Provider-side agent APIs offer that plumbing as a managed service. The trade-off is portability: lean on the provider's API and switching providers becomes a rewrite, not a routing rule.

How we use it

Where the Responses API fits a workflow's needs, we use it through the model gateway, with the same governance and observability as any other route. Where portability or custom orchestration matters more, we use OpenAI through the standard chat completions endpoint and provide our own runtime around it.

Related resources