MonoBERT

What it is

MonoBERT (Nogueira & Cho, 2019) is a cross-encoder reranker that concatenates a query and a candidate passage, feeds the combined input to BERT, and uses the [CLS] representation to predict a relevance score. It is the simplest neural reranker and became the standard baseline after demonstrating strong results on MS MARCO. The “mono” in the name refers to the single-stage pointwise scoring (as opposed to pairwise or listwise variants).

[illustrate: [CLS] query [SEP] passage [SEP] → BERT → [CLS] representation → linear → relevance score]

How it works

  1. Input format:

    [CLS] query tokens [SEP] passage tokens [SEP]
    
  2. Scoring:

    • BERT encodes the concatenated input with full self-attention between query and passage tokens
    • Linear layer on [CLS] output → scalar relevance score
    • Softmax over positive/negative classes
  3. Training:

    • Binary cross-entropy: positive (relevant) vs. negative (non-relevant) passages
    • Negatives sampled from BM25 top-k (non-relevant)
  4. Inference:

    • Score each candidate passage independently (pointwise)
    • Re-rank BM25 or first-stage retrieval results by MonoBERT score
    • Latency: O(n) encoder calls for n candidates

Variants and history

MonoBERT (2019) was among the first papers to show that BERT, applied as a cross-encoder, dramatically outperforms traditional learning-to-rank on MS MARCO. It established the retrieve-then-rerank pipeline. Follow-on: MonoT5 replaced BERT with a seq2seq model; DuoBERT added a pairwise stage on top of MonoBERT; RankT5 extended to listwise ranking. MonoBERT-large on MS MARCO passage remains a competitive baseline years later.

When to use it

Use MonoBERT when:

  • A reranking stage is feasible in your latency budget
  • MS MARCO-style labeled data is available for fine-tuning
  • You want a simple, well-understood reranker to compare against
  • The candidate set is small enough for per-passage BERT inference (typically ≤ 100–1000 passages)

For larger candidate sets or tighter latency, consider MonoT5 (faster) or ColBERT (no explicit reranker needed).

See also