Co-occurrence Matrix
What it is
A co-occurrence matrix X counts how often pairs of terms co-occur within a fixed context window. Entry X_ij represents how many times term i co-occurs with term j. Co-occurrence matrices encode semantic relationships: terms that appear together frequently are semantically related. Matrix factorization of co-occurrence matrices yields word embeddings (as in GloVe).
[illustrate: Small vocabulary matrix showing high counts for related terms (e.g., “cat” and “dog”), low counts for unrelated (“cat” and “spaceship”)]
How it works
-
Matrix construction:
- Define context window (typically 5–10 tokens around target)
- For each target word, count co-occurrences with context words
- Create matrix: rows/cols = vocabulary; X_ij = count(term_i, term_j cooccur)
-
Interpretation:
- High X_ij: terms i and j frequently co-occur (semantically related)
- Low X_ij: rare co-occurrence (unrelated terms)
-
Factorization:
- GloVe learns embeddings by factorizing X
- Encourages X_ij ≈ w_i · w_j
Example
Vocabulary: {dog, cat, food, color, run, jump}
Context window: 5 tokens
Co-occurrence matrix (sample):
dog cat food color run jump
dog 0 8 12 2 5 4
cat 8 0 10 3 4 3
food 12 10 0 5 2 3
color 2 3 5 0 1 1
run 5 4 2 1 0 8
jump 4 3 3 1 8 0
# High counts: (dog, cat), (dog, food), (cat, food)
# → Semantic cluster: animals and food
# Low counts: (dog, color), (color, run)
# → Weak semantic relation
Variants and history
Co-occurrence matrices date to distributional semantics (Harris, 1954). PMI weighting emphasizes meaningful associations over frequent words. Positive Pointwise Mutual Information (PPMI) replaces raw counts with association strength. GloVe (Pennington et al., 2014) combines global co-occurrence with local context. Modern embedding methods (Word2Vec skip-gram, fastText) implicitly optimize co-occurrence through local context windows.
When to use it
Use co-occurrence matrices for:
- Learning word embeddings via factorization
- Linguistic analysis of semantic relationships
- Identifying collocations and phrases
- Corpus analysis and word similarity
- Understanding context-dependent meaning
Co-occurrence analysis is interpretable but requires tuning context window size and handling rare words.