Use case

Latency Budgets

Per-workflow p95 and p99 latency targets enforced as promotion gates and tracked in production — so 'feels slow' becomes a quantified, attributable signal.

Overview

An interactive workflow with a p95 above its budget is a UX problem the model team often does not see. Budgets make latency a first-class constraint.

What it solves

Surfaces latency regressions before they ship and attributes them to the change that caused them.

How we build it

Each workflow declares its p95 and p99 budget. The harness measures both on the eval set. Production telemetry tracks budget compliance. Excursions trigger an alert with the responsible step.

  • Per-workflow p95 and p99 budget
  • Pre-promotion enforcement
  • Production compliance telemetry
  • Step-level attribution on excursion

What changes when it is in place

Latency stops being a complaint and becomes a measurable, attributable engineering signal.