Transformer
-
Transformer
Attention-based neural architecture without recurrence; enables efficient parallel training and strong performance on language tasks. Published by Vaswani et al., 2017.
-
Self-Attention
Attention where query, key, and value vectors come from the same input sequence; enables capturing dependencies within a sequence.
-
Positional Encoding
Injecting token position information into transformer inputs; allows model to distinguish between tokens based on sequence order.
-
Multi-Head Attention
Multiple parallel attention mechanisms operating on different subspaces; enables learning diverse interaction patterns simultaneously.
-
GPT
Generative Pre-trained Transformer; autoregressive decoder-only model for text generation and language understanding, published by OpenAI from 2018 onwards.
-
Context Window
Maximum number of tokens a language model can process in one pass; determines how much context the model sees. Typical values range from 512 to 128k tokens.
-
BERT
Bidirectional Encoder Representations from Transformers; bidirectional transformer pre-trained with masked language modeling, foundational for NLP tasks.