GloVe
What it is
GloVe (Global Vectors for Word Representation) is an unsupervised learning algorithm for word embeddings published by Pennington et al. (2014) at Stanford. It merges two dominant paradigms: global matrix factorization (like LSA) and local context-window methods (like skip-gram). The result is fast training and embeddings that capture both semantic and syntactic structure.
[illustrate: Word co-occurrence matrix X, factorization into word vectors W and context vectors C, with optimization landscape showing convergence]
How it works
- Co-occurrence matrix: Count how often word pairs appear together in a context window across the corpus, producing an X_ij matrix.
- Weighted factorization: Solve for embedding matrices W and C that approximate X, minimizing a weighted least-squares loss. Weighting emphasizes frequent co-occurrences while not over-weighting rare pairs.
- Loss function:
J = Σ_ij f(X_ij) (w_i · c_j + b_i + b_j - log X_ij)^2
The weighting function f(X_ij) increases with frequency but is capped to avoid bias toward very common word pairs.
Example
GloVe vectors exhibit the same compositionality as skip-gram:
vector("king") - vector("man") + vector("woman") ≈ vector("queen")- Vector differences capture analogies learned from global co-occurrence patterns
Pre-trained 300-dimensional GloVe vectors on Common Crawl achieve strong performance on word similarity and analogy benchmarks.
Variants and history
GloVe appeared in 2014 as an alternative to Word2Vec’s skip-gram. Both methods are now considered classics in the word embedding canon. GloVe’s insight—that global matrix information matters—influenced later work on contextualized embeddings. Modern variants combine GloVe-style factorization with neural optimization, and contextual models like BERT supersede static embeddings for most tasks.
When to use it
Choose GloVe when:
- You need pretrained, static word vectors
- Computational efficiency is important
- Both global and local context matter
- You want to avoid training from scratch
- Interpretability and stability matter
GloVe vectors are stable and well-studied, but context-agnostic. For semantic nuance or domain-specific terms, fine-tuned contextual embeddings may outperform.