Attention
-
Transformer
Attention-based neural architecture without recurrence; enables efficient parallel training and strong performance on language tasks. Published by Vaswani et al., 2017.
-
Self-Attention
Attention where query, key, and value vectors come from the same input sequence; enables capturing dependencies within a sequence.
-
Multi-Head Attention
Multiple parallel attention mechanisms operating on different subspaces; enables learning diverse interaction patterns simultaneously.
-
Attention Mechanism
Weighted aggregation of context vectors, allowing models to focus on relevant information. Fundamental to transformers and modern NLP.