Reranking
A second-stage retrieval step where a cross-encoder model re-scores the top candidates from the first stage — the configuration that consistently improves answer quality on real corpora.
Production AI is not a prompt. It is a system of context, tools, permissions, traces, evals, and feedback loops.
What it is
Reranking is the practice of running two retrieval stages. Stage one: a fast but coarse retriever (BM25, vector search, hybrid) finds a candidate set of, say, the top 50 chunks. Stage two: a slower but more accurate cross-encoder model re-scores those 50 against the query and returns the top 5 or 10. The result is consistently better than what stage one alone produces.
Why it matters
First-stage retrievers are tuned for recall — bring back anything plausible. They're not great at picking the best 3 out of 50. A cross-encoder reranker is — but it would be too slow to run on the whole corpus, which is why two stages exist.
How it works
Common rerankers: Cohere Rerank, BGE Rerank, Voyage Rerank. The reranker reads the query and each candidate chunk together (not separately), so it can judge relevance with full context. The latency cost is real and the quality lift is also real — workflows balance the two against their budget.
Related resources
How an AI agent finds the right document, chunk, or row to ground its answer in — and why the part that matters is the pipeline around the database, not the database itself.
A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.
A capability in the Group e-media information AI stack. This resource connects the subject to data substrate, agent runtime, evals, and operations.
The pattern where an AI agent retrieves relevant context from your data before generating an answer — instead of relying only on what the model learned during training.