Atlas

What it is

Atlas (Izacard et al., 2022) is a retrieval-augmented language model that combines a FiD reader with a Contriever retriever and jointly fine-tunes both components. The key finding: with retrieval, an 11B parameter Atlas matches GPT-3 (175B) on several knowledge-intensive benchmarks using only 64 training examples. This demonstrated that retrieval augmentation is a powerful substitute for parameter count when knowledge is involved.

[illustrate: Query → Contriever retriever → top-k passages → FiD encoder-decoder → answer; gradient flows through both retriever and reader during joint fine-tuning]

How it works

Retriever: Contriever or mContriever bi-encoder for dense passage retrieval
Reader: FiD (Fusion-in-Decoder) T5 model that processes passages independently and generates answers
Joint fine-tuning:
- Reader loss: cross-entropy on answer tokens
- Retriever loss: Attention Distillation — passages attended to heavily by the reader get higher retrieval scores
- Both components updated jointly via backpropagation
Efficient joint training:
- Reader attention over passages provides a training signal for the retriever without explicit relevance labels
- “Perplexity Distillation” variant: use reader perplexity reduction as retriever supervision

Variants and history

Atlas (2022, Meta AI) established that retrieval + small model can compete with giant models on knowledge tasks. The result influenced the broader trend of retrieval-augmented systems as an alternative to scaling language model parameters. RA-DIT (2023) extended joint fine-tuning to instruction-tuned LLMs. The Attention Distillation training signal (teach the retriever using reader attention) is reused in several subsequent papers.

When to use it

Use Atlas as a reference architecture when:

You want a jointly trained retriever + reader system
Few-shot performance on knowledge-intensive tasks matters
You want to see how much retrieval augmentation can substitute for model scale

For production RAG, simpler independently-trained pipelines (DPR + T5 or DPR + LLM) are usually preferred.

Atlas

What it is

How it works

Variants and history

When to use it

See also