Use case

Drift Alerts

Alerts when production behavior diverges from eval-set expectations — a model deprecation, a prompt change in upstream library, or a corpus that has quietly aged.

Overview

Models change behavior under the same name. Corpora age. Prompts drift through copy-paste. Drift alerts catch the changes the team did not consciously ship.

What it solves

Surfaces silent regressions before they accumulate. Catches the cases where the model the team thought it was running has been updated underneath.

How we build it

A scheduled comparison between production behavior and eval-set baseline: refusal rate, average length, sentiment shift, citation rate, score on a held-out sample. Material changes alert with a candidate cause.

Scheduled production-vs-eval comparison
Behavior metrics (refusal, length, sentiment)
Candidate cause on alert
Tunable thresholds per workflow

What changes when it is in place

The team finds out about model and corpus drift in time to react.