Operations

WebSocket Mode

An interaction mode where a client maintains a persistent WebSocket connection to the AI platform — for low-latency streaming, real-time voice, multi-turn collaboration, and live tool feedback.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

WebSocket mode is the alternative to request/response for AI workloads that need bidirectional, low-latency communication. Voice agents, real-time collaboration interfaces, and long-running agent sessions all benefit from a persistent connection where the client and platform can exchange messages, tool requests, and progress updates without re-establishing context each time.

Why it matters

HTTP request/response is fine for one-shot Q&A. For voice (where every 200ms matters), for live collaboration (where the agent and user are both typing), and for long-running sessions (where progress streams matter), WebSocket gives a substantially better user experience and lower per-call overhead.

How it works

The client opens a WebSocket to the AI platform. The platform binds the connection to a session (with personality, memory, and tool scope). The two sides exchange typed messages: user input, model output (streamed token-by-token), tool call requests, tool results, status updates. The session persists for the lifetime of the connection. OpenAI Realtime, Anthropic streaming, and most managed voice platforms use this pattern.

Related resources