TAS-B

What it is

TAS-B (Topic-Aware Sampling with BERT, Hofstätter et al., 2021) is a dense retrieval model that improves training efficiency by carefully balancing the topic distribution of training batches and using a cross-encoder teacher to provide soft supervision signals. It achieves strong MS MARCO performance while being more efficient to train than ANCE-style dynamic negative mining.

[illustrate: Cross-encoder teacher scoring query-passage pairs; soft labels distilled to bi-encoder student; topic-balanced batch construction]

How it works

Topic-aware balanced sampling:
- Cluster training queries by topic using pairwise similarity
- Construct batches so each batch covers diverse topics
- Prevents the model from over-fitting to easy within-topic distinctions
Dual supervision:
- Hard labels: binary relevance from MS MARCO annotations
- Soft labels: cross-encoder scores as teacher signals (knowledge distillation)
- Combines both via a weighted loss
Efficiency advantage:
- No dynamic index rebuilding required (unlike ANCE)
- Pre-computed teacher scores offline
- Training is stable and faster than dynamic negative approaches

Example

Batch construction:
  Topic cluster A (biology): 4 queries
  Topic cluster B (history): 4 queries
  Topic cluster C (tech):    4 queries
  → Each batch spans multiple topics

Per-sample loss:
  hard_loss  = cross_entropy(scores, binary_labels)
  soft_loss  = KL(student_scores, teacher_cross_encoder_scores)
  total_loss = hard_loss + λ * soft_loss

Variants and history

TAS-B (2021) was among the first to systematically combine topic-aware sampling with distillation for dense retrieval. It showed that teacher signals from a cross-encoder dramatically improve a bi-encoder without the infrastructure cost of ANCE. The distillation approach was later generalized in SPLADE++, ColBERTv2, and the GPL (Generative Pseudo Labeling) framework for domain adaptation.

When to use it

Use TAS-B when:

MS MARCO-style training data is available
Infrastructure for dynamic index refreshing (ANCE) is not available
A cross-encoder is available to serve as a teacher
You need a strong bi-encoder baseline without exotic training tricks