Root-Cause Analysis
Diagnosing whether a cluster of bad threads is retrieval, reasoning, tool, knowledge, or policy — and routing the fix to the right surface.
Treating every bad thread as a prompt problem is how you ship the same prompt three times. Root-cause analysis routes the fix to the surface that actually needs to change.
What it solves
Stops the team from fixing the symptom instead of the cause. Makes the closure of clusters durable.
How we build it
Replay a sample of threads from the cluster with full trace context. Categorize the cause: retrieval (wrong chunks returned), reasoning (right chunks, wrong inference), tool (right inference, wrong action), knowledge (right tool, wrong source), policy (right answer, wrong refusal). Route the cluster to the team that owns the cause.
- Replay sample with full trace context
- Cause taxonomy per cluster
- Routing to the team owning the cause
- Closure tracked per category
What changes when it is in place
Fixes land in the right place. Recurrence rates drop.