Cross-Encoder
What it is
A cross-encoder (interaction-based ranker) jointly encodes a query and document pair as a single input, producing a relevance score. Unlike bi-encoders which map queries and documents to separate embeddings, cross-encoders can capture fine-grained query-document interactions, resulting in more accurate but computationally expensive ranking.
[illustrate: Query and document concatenated; flowing through single encoder network; output relevance score (0–1)]
How it works
-
Input preparation:
- Concatenate query and document:
[CLS] query [SEP] document [SEP] - Tokenize and embed
- Concatenate query and document:
-
Encoding:
- Pass through transformer encoder (e.g., BERT)
- Extract [CLS] token representation
-
Scoring:
- Pass [CLS] through classification head (linear layer)
- Output relevance score: 0–1 or ranking score
-
Usage in retrieval:
- First stage: retrieve top-100 candidates with bi-encoder or BM25
- Second stage: re-rank top-100 with cross-encoder
- Return reranked top-k
Example
# First stage: BM25 or bi-encoder retrieves candidates
candidates = ["doc_1", "doc_2", ..., "doc_100"]
# Second stage: rerank with cross-encoder
for doc in candidates:
score = cross_encoder("What is machine learning?", doc)
# score ~ 0.85 indicates high relevance
# Reorder by cross-encoder scores
ranked = sort(candidates, key=score, reverse=True)
Variants and history
Cross-encoders emerged in NLP for sentence pair classification (BERT fine-tuning). MS MARCO Passage Ranking (2019) popularized cross-encoders for ranking. MonoBERT and MonoT5 demonstrated that reranking with cross-encoders significantly improves dense retrieval. Modern variants include efficient cross-encoders (DistilBERT), multi-stage cascades, and learned combination with first-stage signals.
When to use it
Use cross-encoders when:
- You have a small candidate set to rerank (50–500 docs)
- Ranking accuracy is more important than speed
- Query-document interactions are complex or subtle
- Computational budget allows 10–100ms per query
- Combining first-stage retrieval (BM25, dense) with second-stage reranking
Cross-encoders are slower than bi-encoders (must encode query per candidate) but more accurate. Standard practice: retrieve 100–1000 with first stage, rerank top-100 with cross-encoder.