Embedding
-
Word2Vec
Efficient neural method for learning word embeddings using skip-gram or CBOW objectives, published by Mikolov et al. in 2013.
-
Word Embedding
Dense vector representation of a word in low-dimensional space, capturing semantic and syntactic relationships.
-
SPLADE
Sparse Lexical and Expansion Embeddings; learns sparse embeddings compatible with inverted indexes while capturing semantic understanding.
-
Sentence Embedding
Dense vector representation of a sentence or passage, aggregating token information into a single low-dimensional vector that preserves semantic meaning.
-
Matryoshka Representation Learning
Training method where prefixes of a vector are also useful embeddings; enables efficient storage and search at multiple granularities. Abbreviated MRL.
-
GloVe
Global Vectors for Word Representation; combines matrix factorization of word co-occurrence statistics with local context windows for learning embeddings.
-
fastText
Word embedding method using character n-grams to handle out-of-vocabulary words and morphological variants; published by Bojanowski et al. in 2017.
-
Dot Product Similarity
Inner product of two vectors; equivalent to cosine similarity when vectors are unit-normalised; fast to compute in dense retrieval.
-
Dense Retrieval
Retrieval method using nearest-neighbour search over dense embedding vectors; contrasts with inverted-index sparse retrieval like BM25.
-
ColBERT
Contextualized Late Interaction over BERT; late-interaction ranking using per-token embeddings with MaxSim scoring for efficient dense retrieval.
-
Bi-Encoder
Neural architecture encoding query and document independently into separate embeddings, enabling fast retrieval via approximate nearest-neighbour search.