Reranker
What it is
A reranker is a neural model applied to a small candidate set (typically top-50 to top-500) returned by first-stage retrieval. Rerankers sacrifice latency for accuracy, performing fine-grained scoring that captures query-document interactions not fully exploited by first-stage signals.
[illustrate: First-stage retrieval ranking 100 candidates; reranker re-scoring top-50; reranked final list showing accuracy improvement]
How it works
-
First-stage retrieval:
- BM25 retrieves top-k (e.g., top-100)
- Or bi-encoder ANN search returns top-k
-
Candidate reranking:
- Pass each top-k candidate to reranker
- Reranker assigns relevance scores (0–1 or raw scores)
- Resort candidates by reranker scores
-
Output:
- Return top-k reranked documents
- Latency: typically 10–100ms for reranking top-100
Example
# First-stage BM25 ranking:
1. "Machine learning fundamentals" (BM25: 45.2)
2. "Random article about sports" (BM25: 42.1)
3. "AI and deep learning trends" (BM25: 41.5)
Query: "machine learning basics"
# Reranker re-scoring:
1. "Machine learning fundamentals" (reranker: 0.92)
3. "AI and deep learning trends" (reranker: 0.78)
2. "Random article about sports" (reranker: 0.05)
# Final reranked list:
1. "Machine learning fundamentals"
2. "AI and deep learning trends"
3. "Random article about sports"
Variants and history
Early reranking used learning-to-rank (LTR) methods (2000s). Neural reranking emerged with deep learning; BERT for ranking (2019) and MonoBERT demonstrated cross-encoder effectiveness. MonoT5 extended to generation-based ranking. Modern variants include efficient rerankers (DistilBERT, TinyBERT), multi-stage cascades, and learned ranking functions that combine multiple signals.
When to use it
Use reranking when:
- First-stage recall is high but ranking is imperfect
- You have 50–1000 candidates to rerank
- Query-document interactions matter
- Computational budget allows 10–200ms per query
- NDCG or MRR improvements justify latency addition
Reranking is standard in production systems: BM25 or dense first-stage for recall, reranker for precision. Cost-benefit: 5–10% latency increase for 10–30% ranking improvement is typical.