Reranker

Reranker Ranking Second-Stage Cross-Encoder Scoring Needs-Review

What it is

A reranker is a neural model applied to a small candidate set (typically top-50 to top-500) returned by first-stage retrieval. Rerankers sacrifice latency for accuracy, performing fine-grained scoring that captures query-document interactions not fully exploited by first-stage signals.

[illustrate: First-stage retrieval ranking 100 candidates; reranker re-scoring top-50; reranked final list showing accuracy improvement]

How it works

First-stage retrieval:
- BM25 retrieves top-k (e.g., top-100)
- Or bi-encoder ANN search returns top-k
Candidate reranking:
- Pass each top-k candidate to reranker
- Reranker assigns relevance scores (0–1 or raw scores)
- Resort candidates by reranker scores
Output:
- Return top-k reranked documents
- Latency: typically 10–100ms for reranking top-100

Example

# First-stage BM25 ranking:
1. "Machine learning fundamentals" (BM25: 45.2)
2. "Random article about sports" (BM25: 42.1)
3. "AI and deep learning trends" (BM25: 41.5)

Query: "machine learning basics"

# Reranker re-scoring:
1. "Machine learning fundamentals" (reranker: 0.92)
3. "AI and deep learning trends" (reranker: 0.78)
2. "Random article about sports" (reranker: 0.05)

# Final reranked list:
1. "Machine learning fundamentals"
2. "AI and deep learning trends"
3. "Random article about sports"

Variants and history

Early reranking used learning-to-rank (LTR) methods (2000s). Neural reranking emerged with deep learning; BERT for ranking (2019) and MonoBERT demonstrated cross-encoder effectiveness. MonoT5 extended to generation-based ranking. Modern variants include efficient rerankers (DistilBERT, TinyBERT), multi-stage cascades, and learned ranking functions that combine multiple signals.

When to use it

Use reranking when:

First-stage recall is high but ranking is imperfect
You have 50–1000 candidates to rerank
Query-document interactions matter
Computational budget allows 10–200ms per query
NDCG or MRR improvements justify latency addition

Reranking is standard in production systems: BM25 or dense first-stage for recall, reranker for precision. Cost-benefit: 5–10% latency increase for 10–30% ranking improvement is typical.