MonoBERT

What it is

MonoBERT (Nogueira & Cho, 2019) is a cross-encoder reranker that concatenates a query and a candidate passage, feeds the combined input to BERT, and uses the [CLS] representation to predict a relevance score. It is the simplest neural reranker and became the standard baseline after demonstrating strong results on MS MARCO. The “mono” in the name refers to the single-stage pointwise scoring (as opposed to pairwise or listwise variants).

[illustrate: [CLS] query [SEP] passage [SEP] → BERT → [CLS] representation → linear → relevance score]

How it works

Input format:

[CLS] query tokens [SEP] passage tokens [SEP]

Scoring:
- BERT encodes the concatenated input with full self-attention between query and passage tokens
- Linear layer on [CLS] output → scalar relevance score
- Softmax over positive/negative classes
Training:
- Binary cross-entropy: positive (relevant) vs. negative (non-relevant) passages
- Negatives sampled from BM25 top-k (non-relevant)
Inference:
- Score each candidate passage independently (pointwise)
- Re-rank BM25 or first-stage retrieval results by MonoBERT score
- Latency: O(n) encoder calls for n candidates

Variants and history

MonoBERT (2019) was among the first papers to show that BERT, applied as a cross-encoder, dramatically outperforms traditional learning-to-rank on MS MARCO. It established the retrieve-then-rerank pipeline. Follow-on: MonoT5 replaced BERT with a seq2seq model; DuoBERT added a pairwise stage on top of MonoBERT; RankT5 extended to listwise ranking. MonoBERT-large on MS MARCO passage remains a competitive baseline years later.

When to use it

Use MonoBERT when:

A reranking stage is feasible in your latency budget
MS MARCO-style labeled data is available for fine-tuning
You want a simple, well-understood reranker to compare against
The candidate set is small enough for per-passage BERT inference (typically ≤ 100–1000 passages)

For larger candidate sets or tighter latency, consider MonoT5 (faster) or ColBERT (no explicit reranker needed).

MonoBERT

What it is

How it works

Variants and history

When to use it

See also