Use case
Drift Alerts
Alerts when production behavior diverges from eval-set expectations — a model deprecation, a prompt change in upstream library, or a corpus that has quietly aged.
Overview
Models change behavior under the same name. Corpora age. Prompts drift through copy-paste. Drift alerts catch the changes the team did not consciously ship.
What it solves
Surfaces silent regressions before they accumulate. Catches the cases where the model the team thought it was running has been updated underneath.
How we build it
A scheduled comparison between production behavior and eval-set baseline: refusal rate, average length, sentiment shift, citation rate, score on a held-out sample. Material changes alert with a candidate cause.
- Scheduled production-vs-eval comparison
- Behavior metrics (refusal, length, sentiment)
- Candidate cause on alert
- Tunable thresholds per workflow
What changes when it is in place
The team finds out about model and corpus drift in time to react.