FiD (Fusion-in-Decoder)

Fid Fusion-in-Decoder Retrieval-Augmented Sequence-to-Sequence Neural-Ir Needs-Review

What it is

FiD (Fusion-in-Decoder, Izacard & Grave, 2020) is a reader architecture for open-domain QA that processes each retrieved passage independently through a T5 encoder, then concatenates all encoded representations for the decoder to attend over. This sidesteps the quadratic attention cost of processing all passages together as a single long sequence, enabling effective use of many more retrieved passages (up to 100).

[illustrate: k retrieved passages → k independent T5 encoder calls → concatenated encoder outputs → single T5 decoder generates answer; attention spans all passages]

How it works

Independent encoding:
- For each of k retrieved passages: prepend query, encode with T5 encoder
- Each passage produces a sequence of hidden states [L × d]
- k passages → k × L encoder outputs
Fusion in decoder:
- Concatenate all k × L hidden states
- T5 decoder attends over the full concatenated representation
- Cross-attention spans all passages simultaneously
Generation:
- Decoder generates the answer autoregressively
- Naturally aggregates information across passages
Scaling advantage:
- Encoding is linear in k (k independent passes, can be parallelized)
- vs. concatenation: quadratic attention over L × k tokens
- Enables k = 100 passages vs. k ≈ 5–10 for concatenation approaches

Variants and history

FiD (2020) became the standard reader architecture for open-domain QA, substantially outperforming both single-passage readers and full-concatenation approaches. FiD-KD added knowledge distillation between reader and retriever. Atlas (2022) fine-tuned FiD jointly with a retriever and showed strong few-shot performance. The independent encoding pattern is now common in RAG systems where context window limits require chunked processing.

When to use it

Use FiD when:

Open-domain QA requires synthesizing information across many passages
Context window constraints prevent concatenating all retrieved passages
A generative answer is required (not just passage ranking)
You have a T5-family model and can parallelize encoder passes

FiD (Fusion-in-Decoder)

What it is

How it works

Variants and history

When to use it

See also