FiD (Fusion-in-Decoder)
What it is
FiD (Fusion-in-Decoder, Izacard & Grave, 2020) is a reader architecture for open-domain QA that processes each retrieved passage independently through a T5 encoder, then concatenates all encoded representations for the decoder to attend over. This sidesteps the quadratic attention cost of processing all passages together as a single long sequence, enabling effective use of many more retrieved passages (up to 100).
[illustrate: k retrieved passages → k independent T5 encoder calls → concatenated encoder outputs → single T5 decoder generates answer; attention spans all passages]
How it works
-
Independent encoding:
- For each of k retrieved passages: prepend query, encode with T5 encoder
- Each passage produces a sequence of hidden states [L × d]
- k passages → k × L encoder outputs
-
Fusion in decoder:
- Concatenate all k × L hidden states
- T5 decoder attends over the full concatenated representation
- Cross-attention spans all passages simultaneously
-
Generation:
- Decoder generates the answer autoregressively
- Naturally aggregates information across passages
-
Scaling advantage:
- Encoding is linear in k (k independent passes, can be parallelized)
- vs. concatenation: quadratic attention over L × k tokens
- Enables k = 100 passages vs. k ≈ 5–10 for concatenation approaches
Variants and history
FiD (2020) became the standard reader architecture for open-domain QA, substantially outperforming both single-passage readers and full-concatenation approaches. FiD-KD added knowledge distillation between reader and retriever. Atlas (2022) fine-tuned FiD jointly with a retriever and showed strong few-shot performance. The independent encoding pattern is now common in RAG systems where context window limits require chunked processing.
When to use it
Use FiD when:
- Open-domain QA requires synthesizing information across many passages
- Context window constraints prevent concatenating all retrieved passages
- A generative answer is required (not just passage ranking)
- You have a T5-family model and can parallelize encoder passes