Cross-Encoder

What it is

A cross-encoder (interaction-based ranker) jointly encodes a query and document pair as a single input, producing a relevance score. Unlike bi-encoders which map queries and documents to separate embeddings, cross-encoders can capture fine-grained query-document interactions, resulting in more accurate but computationally expensive ranking.

[illustrate: Query and document concatenated; flowing through single encoder network; output relevance score (0–1)]

How it works

  1. Input preparation:

    • Concatenate query and document: [CLS] query [SEP] document [SEP]
    • Tokenize and embed
  2. Encoding:

    • Pass through transformer encoder (e.g., BERT)
    • Extract [CLS] token representation
  3. Scoring:

    • Pass [CLS] through classification head (linear layer)
    • Output relevance score: 0–1 or ranking score
  4. Usage in retrieval:

    • First stage: retrieve top-100 candidates with bi-encoder or BM25
    • Second stage: re-rank top-100 with cross-encoder
    • Return reranked top-k

Example

# First stage: BM25 or bi-encoder retrieves candidates
candidates = ["doc_1", "doc_2", ..., "doc_100"]

# Second stage: rerank with cross-encoder
for doc in candidates:
    score = cross_encoder("What is machine learning?", doc)
    # score ~ 0.85 indicates high relevance

# Reorder by cross-encoder scores
ranked = sort(candidates, key=score, reverse=True)

Variants and history

Cross-encoders emerged in NLP for sentence pair classification (BERT fine-tuning). MS MARCO Passage Ranking (2019) popularized cross-encoders for ranking. MonoBERT and MonoT5 demonstrated that reranking with cross-encoders significantly improves dense retrieval. Modern variants include efficient cross-encoders (DistilBERT), multi-stage cascades, and learned combination with first-stage signals.

When to use it

Use cross-encoders when:

  • You have a small candidate set to rerank (50–500 docs)
  • Ranking accuracy is more important than speed
  • Query-document interactions are complex or subtle
  • Computational budget allows 10–100ms per query
  • Combining first-stage retrieval (BM25, dense) with second-stage reranking

Cross-encoders are slower than bi-encoders (must encode query per candidate) but more accurate. Standard practice: retrieve 100–1000 with first stage, rerank top-100 with cross-encoder.

See also