TAS-B

What it is

TAS-B (Topic-Aware Sampling with BERT, Hofstätter et al., 2021) is a dense retrieval model that improves training efficiency by carefully balancing the topic distribution of training batches and using a cross-encoder teacher to provide soft supervision signals. It achieves strong MS MARCO performance while being more efficient to train than ANCE-style dynamic negative mining.

[illustrate: Cross-encoder teacher scoring query-passage pairs; soft labels distilled to bi-encoder student; topic-balanced batch construction]

How it works

  1. Topic-aware balanced sampling:

    • Cluster training queries by topic using pairwise similarity
    • Construct batches so each batch covers diverse topics
    • Prevents the model from over-fitting to easy within-topic distinctions
  2. Dual supervision:

    • Hard labels: binary relevance from MS MARCO annotations
    • Soft labels: cross-encoder scores as teacher signals (knowledge distillation)
    • Combines both via a weighted loss
  3. Efficiency advantage:

    • No dynamic index rebuilding required (unlike ANCE)
    • Pre-computed teacher scores offline
    • Training is stable and faster than dynamic negative approaches

Example

Batch construction:
  Topic cluster A (biology): 4 queries
  Topic cluster B (history): 4 queries
  Topic cluster C (tech):    4 queries
  → Each batch spans multiple topics

Per-sample loss:
  hard_loss  = cross_entropy(scores, binary_labels)
  soft_loss  = KL(student_scores, teacher_cross_encoder_scores)
  total_loss = hard_loss + λ * soft_loss

Variants and history

TAS-B (2021) was among the first to systematically combine topic-aware sampling with distillation for dense retrieval. It showed that teacher signals from a cross-encoder dramatically improve a bi-encoder without the infrastructure cost of ANCE. The distillation approach was later generalized in SPLADE++, ColBERTv2, and the GPL (Generative Pseudo Labeling) framework for domain adaptation.

When to use it

Use TAS-B when:

  • MS MARCO-style training data is available
  • Infrastructure for dynamic index refreshing (ANCE) is not available
  • A cross-encoder is available to serve as a teacher
  • You need a strong bi-encoder baseline without exotic training tricks

See also