Use case

Reranking Policy

How many candidates to rerank, with which model, against which signals — tuned per workflow against the eval set instead of copied from a tutorial.

Overview

Reranking improves quality at the cost of latency and dollars. The right policy is the one your eval set picks, not the default in a framework example.

What it solves

Makes the rerank-vs-no-rerank, top-50-vs-top-100 trade-off measurable instead of intuitive. Lets latency-sensitive workflows tune rerank depth without sacrificing quality on slow-path workflows.

How we build it

Per-workflow rerank policy: depth, reranker model, threshold for early exit. The eval set runs the same query against multiple configurations and picks the Pareto-dominant point on quality, latency, and cost.

Per-workflow depth and model
Threshold-based early exit
Quality vs latency vs cost Pareto curve
Tunable from the eval set

What changes when it is in place

Rerank stops being a binary choice. Each workflow gets the configuration its quality and latency budget actually justify.