Reranking Policy
How many candidates to rerank, with which model, against which signals — tuned per workflow against the eval set instead of copied from a tutorial.
Reranking improves quality at the cost of latency and dollars. The right policy is the one your eval set picks, not the default in a framework example.
What it solves
Makes the rerank-vs-no-rerank, top-50-vs-top-100 trade-off measurable instead of intuitive. Lets latency-sensitive workflows tune rerank depth without sacrificing quality on slow-path workflows.
How we build it
Per-workflow rerank policy: depth, reranker model, threshold for early exit. The eval set runs the same query against multiple configurations and picks the Pareto-dominant point on quality, latency, and cost.
- Per-workflow depth and model
- Threshold-based early exit
- Quality vs latency vs cost Pareto curve
- Tunable from the eval set
What changes when it is in place
Rerank stops being a binary choice. Each workflow gets the configuration its quality and latency budget actually justify.