Operations

Trace Replay

Deterministically re-running an AI workflow from its stored trace — the debugging primitive that makes 'why did the agent do that' a question with an answer.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

Trace replay is the ability to take a workflow run from last Tuesday — with all its captured inputs, retrieved context, model outputs, and tool results — and re-execute it deterministically in dev. The developer can step through, change a prompt or model, and see how the outcome would have differed.

Why it matters

Agents are non-deterministic. Without replay, debugging a bad output is guesswork: you ask the user what they typed, you guess at what was retrieved, you can't reproduce. With replay, the bad run becomes a unit test.

How it works

Traces capture every input that could change behavior: event payload, retrieved chunks, tool inputs and outputs, model parameters and seeds where available, the model's response. Replay re-runs the workflow against captured inputs; differences from production are themselves a tracked signal. Tools like Langfuse, Arize Phoenix, and Helicone support this pattern.

Related resources