GloVe

What it is

GloVe (Global Vectors for Word Representation) is an unsupervised learning algorithm for word embeddings published by Pennington et al. (2014) at Stanford. It merges two dominant paradigms: global matrix factorization (like LSA) and local context-window methods (like skip-gram). The result is fast training and embeddings that capture both semantic and syntactic structure.

[illustrate: Word co-occurrence matrix X, factorization into word vectors W and context vectors C, with optimization landscape showing convergence]

How it works

  1. Co-occurrence matrix: Count how often word pairs appear together in a context window across the corpus, producing an X_ij matrix.
  2. Weighted factorization: Solve for embedding matrices W and C that approximate X, minimizing a weighted least-squares loss. Weighting emphasizes frequent co-occurrences while not over-weighting rare pairs.
  3. Loss function: J = Σ_ij f(X_ij) (w_i · c_j + b_i + b_j - log X_ij)^2

The weighting function f(X_ij) increases with frequency but is capped to avoid bias toward very common word pairs.

Example

GloVe vectors exhibit the same compositionality as skip-gram:

  • vector("king") - vector("man") + vector("woman") ≈ vector("queen")
  • Vector differences capture analogies learned from global co-occurrence patterns

Pre-trained 300-dimensional GloVe vectors on Common Crawl achieve strong performance on word similarity and analogy benchmarks.

Variants and history

GloVe appeared in 2014 as an alternative to Word2Vec’s skip-gram. Both methods are now considered classics in the word embedding canon. GloVe’s insight—that global matrix information matters—influenced later work on contextualized embeddings. Modern variants combine GloVe-style factorization with neural optimization, and contextual models like BERT supersede static embeddings for most tasks.

When to use it

Choose GloVe when:

  • You need pretrained, static word vectors
  • Computational efficiency is important
  • Both global and local context matter
  • You want to avoid training from scratch
  • Interpretability and stability matter

GloVe vectors are stable and well-studied, but context-agnostic. For semantic nuance or domain-specific terms, fine-tuned contextual embeddings may outperform.

See also