MonoT5
What it is
MonoT5 (Nogueira et al., 2020) is a reranking model that frames relevance scoring as text generation. Given a query and passage, T5 is prompted to generate either “true” (relevant) or “false” (not relevant). The log-probability of generating “true” serves as the relevance score. This formulation leverages T5’s pre-training more naturally than a classification head and generalizes well to new domains without retraining.
[illustrate: Prompt with query + passage → T5 → token probabilities for “true”/“false” → relevance score = log P(“true”)]
How it works
-
Prompt template:
Query: {query} Document: {passage} Relevant: -
Scoring:
- Generate from the prompt
- Record log-probability of “true” token and “false” token
- Relevance score = log P("▁true") − log P("▁false")
-
Training:
- Fine-tune T5 on MS MARCO (or domain data) with binary relevance labels
- Teacher forcing with “true”/“false” as targets
-
Efficiency:
- Only two tokens need to be decoded (not the full passage)
- Faster inference than MonoBERT for the same model size
Variants and history
MonoT5 (2020) showed that generative models can rerank effectively, opening the door to zero-shot and few-shot reranking with large LMs. DuoT5 added a pairwise preference stage on top of MonoT5. RankGPT extended the paradigm to GPT-3/4 without any fine-tuning. RankT5 modified T5 for listwise ranking. MonoT5 with T5-3B was the state of the art on MS MARCO at publication and remains competitive.
When to use it
Use MonoT5 when:
- Reranking quality is the priority and latency allows for a generative model
- You want a reranker that generalizes to new domains without retraining (especially larger T5 sizes)
- You already have a T5 model in your infrastructure
- Zero-shot cross-domain reranking is needed (use T5-3B or larger)
For fastest reranking, MonoBERT-base is faster; for highest quality without LLMs, MonoT5-3B is strong.