Data substrate

Reranking

A second-stage retrieval step where a cross-encoder model re-scores the top candidates from the first stage — the configuration that consistently improves answer quality on real corpora.

Operating principle

Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.

What it is

Reranking is the practice of running two retrieval stages. Stage one: a fast but coarse retriever (BM25, vector search, hybrid) finds a candidate set of, say, the top 50 chunks. Stage two: a slower but more accurate cross-encoder model re-scores those 50 against the query and returns the top 5 or 10. The result is consistently better than what stage one alone produces.

Why it matters

First-stage retrievers are tuned for recall — bring back anything plausible. They're not great at picking the best 3 out of 50. A cross-encoder reranker is — but it would be too slow to run on the whole corpus, which is why two stages exist.

How it works

Common rerankers: Cohere Rerank, BGE Rerank, Voyage Rerank. The reranker reads the query and each candidate chunk together (not separately), so it can judge relevance with full context. The latency cost is real and the quality lift is also real — workflows balance the two against their budget.

Related resources