MonoBERT
What it is
MonoBERT (Nogueira & Cho, 2019) is a cross-encoder reranker that concatenates a query and a candidate passage, feeds the combined input to BERT, and uses the [CLS] representation to predict a relevance score. It is the simplest neural reranker and became the standard baseline after demonstrating strong results on MS MARCO. The “mono” in the name refers to the single-stage pointwise scoring (as opposed to pairwise or listwise variants).
[illustrate: [CLS] query [SEP] passage [SEP] → BERT → [CLS] representation → linear → relevance score]
How it works
-
Input format:
[CLS] query tokens [SEP] passage tokens [SEP] -
Scoring:
- BERT encodes the concatenated input with full self-attention between query and passage tokens
- Linear layer on
[CLS]output → scalar relevance score - Softmax over positive/negative classes
-
Training:
- Binary cross-entropy: positive (relevant) vs. negative (non-relevant) passages
- Negatives sampled from BM25 top-k (non-relevant)
-
Inference:
- Score each candidate passage independently (pointwise)
- Re-rank BM25 or first-stage retrieval results by MonoBERT score
- Latency: O(n) encoder calls for n candidates
Variants and history
MonoBERT (2019) was among the first papers to show that BERT, applied as a cross-encoder, dramatically outperforms traditional learning-to-rank on MS MARCO. It established the retrieve-then-rerank pipeline. Follow-on: MonoT5 replaced BERT with a seq2seq model; DuoBERT added a pairwise stage on top of MonoBERT; RankT5 extended to listwise ranking. MonoBERT-large on MS MARCO passage remains a competitive baseline years later.
When to use it
Use MonoBERT when:
- A reranking stage is feasible in your latency budget
- MS MARCO-style labeled data is available for fine-tuning
- You want a simple, well-understood reranker to compare against
- The candidate set is small enough for per-passage BERT inference (typically ≤ 100–1000 passages)
For larger candidate sets or tighter latency, consider MonoT5 (faster) or ColBERT (no explicit reranker needed).