TAS-B
What it is
TAS-B (Topic-Aware Sampling with BERT, Hofstätter et al., 2021) is a dense retrieval model that improves training efficiency by carefully balancing the topic distribution of training batches and using a cross-encoder teacher to provide soft supervision signals. It achieves strong MS MARCO performance while being more efficient to train than ANCE-style dynamic negative mining.
[illustrate: Cross-encoder teacher scoring query-passage pairs; soft labels distilled to bi-encoder student; topic-balanced batch construction]
How it works
-
Topic-aware balanced sampling:
- Cluster training queries by topic using pairwise similarity
- Construct batches so each batch covers diverse topics
- Prevents the model from over-fitting to easy within-topic distinctions
-
Dual supervision:
- Hard labels: binary relevance from MS MARCO annotations
- Soft labels: cross-encoder scores as teacher signals (knowledge distillation)
- Combines both via a weighted loss
-
Efficiency advantage:
- No dynamic index rebuilding required (unlike ANCE)
- Pre-computed teacher scores offline
- Training is stable and faster than dynamic negative approaches
Example
Batch construction:
Topic cluster A (biology): 4 queries
Topic cluster B (history): 4 queries
Topic cluster C (tech): 4 queries
→ Each batch spans multiple topics
Per-sample loss:
hard_loss = cross_entropy(scores, binary_labels)
soft_loss = KL(student_scores, teacher_cross_encoder_scores)
total_loss = hard_loss + λ * soft_loss
Variants and history
TAS-B (2021) was among the first to systematically combine topic-aware sampling with distillation for dense retrieval. It showed that teacher signals from a cross-encoder dramatically improve a bi-encoder without the infrastructure cost of ANCE. The distillation approach was later generalized in SPLADE++, ColBERTv2, and the GPL (Generative Pseudo Labeling) framework for domain adaptation.
When to use it
Use TAS-B when:
- MS MARCO-style training data is available
- Infrastructure for dynamic index refreshing (ANCE) is not available
- A cross-encoder is available to serve as a teacher
- You need a strong bi-encoder baseline without exotic training tricks