Use case
Latency Budgets
Per-workflow p95 and p99 latency targets enforced as promotion gates and tracked in production — so 'feels slow' becomes a quantified, attributable signal.
Overview
An interactive workflow with a p95 above its budget is a UX problem the model team often does not see. Budgets make latency a first-class constraint.
What it solves
Surfaces latency regressions before they ship and attributes them to the change that caused them.
How we build it
Each workflow declares its p95 and p99 budget. The harness measures both on the eval set. Production telemetry tracks budget compliance. Excursions trigger an alert with the responsible step.
- Per-workflow p95 and p99 budget
- Pre-promotion enforcement
- Production compliance telemetry
- Step-level attribution on excursion
What changes when it is in place
Latency stops being a complaint and becomes a measurable, attributable engineering signal.