Co-occurrence Matrix

What it is

A co-occurrence matrix X counts how often pairs of terms co-occur within a fixed context window. Entry X_ij represents how many times term i co-occurs with term j. Co-occurrence matrices encode semantic relationships: terms that appear together frequently are semantically related. Matrix factorization of co-occurrence matrices yields word embeddings (as in GloVe).

[illustrate: Small vocabulary matrix showing high counts for related terms (e.g., “cat” and “dog”), low counts for unrelated (“cat” and “spaceship”)]

How it works

Matrix construction:
- Define context window (typically 5–10 tokens around target)
- For each target word, count co-occurrences with context words
- Create matrix: rows/cols = vocabulary; X_ij = count(term_i, term_j cooccur)
Interpretation:
- High X_ij: terms i and j frequently co-occur (semantically related)
- Low X_ij: rare co-occurrence (unrelated terms)
Factorization:
- GloVe learns embeddings by factorizing X
- Encourages X_ij ≈ w_i · w_j

Example

Vocabulary: {dog, cat, food, color, run, jump}
Context window: 5 tokens

Co-occurrence matrix (sample):
        dog  cat  food color run  jump
dog     0    8    12   2     5    4
cat     8    0    10   3     4    3
food    12   10   0    5     2    3
color   2    3    5    0     1    1
run     5    4    2    1     0    8
jump    4    3    3    1     8    0

# High counts: (dog, cat), (dog, food), (cat, food)
#   → Semantic cluster: animals and food
# Low counts: (dog, color), (color, run)
#   → Weak semantic relation

Variants and history

Co-occurrence matrices date to distributional semantics (Harris, 1954). PMI weighting emphasizes meaningful associations over frequent words. Positive Pointwise Mutual Information (PPMI) replaces raw counts with association strength. GloVe (Pennington et al., 2014) combines global co-occurrence with local context. Modern embedding methods (Word2Vec skip-gram, fastText) implicitly optimize co-occurrence through local context windows.

When to use it

Use co-occurrence matrices for:

Learning word embeddings via factorization
Linguistic analysis of semantic relationships
Identifying collocations and phrases
Corpus analysis and word similarity
Understanding context-dependent meaning

Co-occurrence analysis is interpretable but requires tuning context window size and handling rare words.