Cross-Encoder

Cross-Encoder Encoder Ranking Reranking Pairwise Needs-Review

What it is

A cross-encoder (interaction-based ranker) jointly encodes a query and document pair as a single input, producing a relevance score. Unlike bi-encoders which map queries and documents to separate embeddings, cross-encoders can capture fine-grained query-document interactions, resulting in more accurate but computationally expensive ranking.

[illustrate: Query and document concatenated; flowing through single encoder network; output relevance score (0–1)]

How it works

Input preparation:
- Concatenate query and document: [CLS] query [SEP] document [SEP]
- Tokenize and embed
Encoding:
- Pass through transformer encoder (e.g., BERT)
- Extract [CLS] token representation
Scoring:
- Pass [CLS] through classification head (linear layer)
- Output relevance score: 0–1 or ranking score
Usage in retrieval:
- First stage: retrieve top-100 candidates with bi-encoder or BM25
- Second stage: re-rank top-100 with cross-encoder
- Return reranked top-k

Example

# First stage: BM25 or bi-encoder retrieves candidates
candidates = ["doc_1", "doc_2", ..., "doc_100"]

# Second stage: rerank with cross-encoder
for doc in candidates:
    score = cross_encoder("What is machine learning?", doc)
    # score ~ 0.85 indicates high relevance

# Reorder by cross-encoder scores
ranked = sort(candidates, key=score, reverse=True)

Variants and history

Cross-encoders emerged in NLP for sentence pair classification (BERT fine-tuning). MS MARCO Passage Ranking (2019) popularized cross-encoders for ranking. MonoBERT and MonoT5 demonstrated that reranking with cross-encoders significantly improves dense retrieval. Modern variants include efficient cross-encoders (DistilBERT), multi-stage cascades, and learned combination with first-stage signals.

When to use it

Use cross-encoders when:

You have a small candidate set to rerank (50–500 docs)
Ranking accuracy is more important than speed
Query-document interactions are complex or subtle
Computational budget allows 10–100ms per query
Combining first-stage retrieval (BM25, dense) with second-stage reranking

Cross-encoders are slower than bi-encoders (must encode query per candidate) but more accurate. Standard practice: retrieve 100–1000 with first stage, rerank top-100 with cross-encoder.