Atlas
What it is
Atlas (Izacard et al., 2022) is a retrieval-augmented language model that combines a FiD reader with a Contriever retriever and jointly fine-tunes both components. The key finding: with retrieval, an 11B parameter Atlas matches GPT-3 (175B) on several knowledge-intensive benchmarks using only 64 training examples. This demonstrated that retrieval augmentation is a powerful substitute for parameter count when knowledge is involved.
[illustrate: Query → Contriever retriever → top-k passages → FiD encoder-decoder → answer; gradient flows through both retriever and reader during joint fine-tuning]
How it works
-
Retriever: Contriever or mContriever bi-encoder for dense passage retrieval
-
Reader: FiD (Fusion-in-Decoder) T5 model that processes passages independently and generates answers
-
Joint fine-tuning:
- Reader loss: cross-entropy on answer tokens
- Retriever loss: Attention Distillation — passages attended to heavily by the reader get higher retrieval scores
- Both components updated jointly via backpropagation
-
Efficient joint training:
- Reader attention over passages provides a training signal for the retriever without explicit relevance labels
- “Perplexity Distillation” variant: use reader perplexity reduction as retriever supervision
Variants and history
Atlas (2022, Meta AI) established that retrieval + small model can compete with giant models on knowledge tasks. The result influenced the broader trend of retrieval-augmented systems as an alternative to scaling language model parameters. RA-DIT (2023) extended joint fine-tuning to instruction-tuned LLMs. The Attention Distillation training signal (teach the retriever using reader attention) is reused in several subsequent papers.
When to use it
Use Atlas as a reference architecture when:
- You want a jointly trained retriever + reader system
- Few-shot performance on knowledge-intensive tasks matters
- You want to see how much retrieval augmentation can substitute for model scale
For production RAG, simpler independently-trained pipelines (DPR + T5 or DPR + LLM) are usually preferred.