MonoT5

What it is

MonoT5 (Nogueira et al., 2020) is a reranking model that frames relevance scoring as text generation. Given a query and passage, T5 is prompted to generate either “true” (relevant) or “false” (not relevant). The log-probability of generating “true” serves as the relevance score. This formulation leverages T5’s pre-training more naturally than a classification head and generalizes well to new domains without retraining.

[illustrate: Prompt with query + passage → T5 → token probabilities for “true”/“false” → relevance score = log P(“true”)]

How it works

  1. Prompt template:

    Query: {query} Document: {passage} Relevant:
    
  2. Scoring:

    • Generate from the prompt
    • Record log-probability of “true” token and “false” token
    • Relevance score = log P("▁true") − log P("▁false")
  3. Training:

    • Fine-tune T5 on MS MARCO (or domain data) with binary relevance labels
    • Teacher forcing with “true”/“false” as targets
  4. Efficiency:

    • Only two tokens need to be decoded (not the full passage)
    • Faster inference than MonoBERT for the same model size

Variants and history

MonoT5 (2020) showed that generative models can rerank effectively, opening the door to zero-shot and few-shot reranking with large LMs. DuoT5 added a pairwise preference stage on top of MonoT5. RankGPT extended the paradigm to GPT-3/4 without any fine-tuning. RankT5 modified T5 for listwise ranking. MonoT5 with T5-3B was the state of the art on MS MARCO at publication and remains competitive.

When to use it

Use MonoT5 when:

  • Reranking quality is the priority and latency allows for a generative model
  • You want a reranker that generalizes to new domains without retraining (especially larger T5 sizes)
  • You already have a T5 model in your infrastructure
  • Zero-shot cross-domain reranking is needed (use T5-3B or larger)

For fastest reranking, MonoBERT-base is faster; for highest quality without LLMs, MonoT5-3B is strong.

See also