Pointwise Mutual Information
What it is
Pointwise Mutual Information (PMI) is a measure of association between two terms, quantifying how much their co-occurrence exceeds what would be expected by chance. PMI = log(P(i,j) / (P(i) × P(j))); positive PMI indicates meaningful association (collocation), negative indicates mutual exclusion, zero indicates independence.
[illustrate: Scatter plot of two words with PMI scores; collocations (high PMI) vs. random co-occurrences (low PMI)]
How it works
PMI formula:
PMI(i, j) = log(P(i, j) / (P(i) × P(j)))
= log(count(i, j) × N / (count(i) × count(j)))
Where:
- P(i, j): joint probability (co-occurrence frequency)
- P(i), P(j): marginal probabilities (individual frequencies)
- N: corpus size
Interpretation:
- PMI > 0: co-occur more than expected (association)
- PMI = 0: independent (no association)
- PMI < 0: co-occur less than expected (mutual exclusion)
- Unbounded: very rare pairs can have extreme PMI
Issues:
- Biased toward rare pairs (low frequencies inflate PMI)
- Positive PMI (PPMI): max(0, PMI) ignores negative PMI
Example
Corpus: 1M tokens
Term pair: ("strong", "tea")
count(strong, tea) = 50
count(strong) = 10,000
count(tea) = 5,000
N = 1M
P(strong, tea) = 50 / 1M = 0.00005
P(strong) = 10k / 1M = 0.01
P(tea) = 5k / 1M = 0.005
PMI = log(0.00005 / (0.01 × 0.005))
= log(0.00005 / 0.00005)
= log(1) = 0
# Try ("strong", "coffee"):
count(strong, coffee) = 100 (more co-occurrence)
PMI = log(100/1M / (0.01 × 0.003))
= log(0.0001 / 0.00003)
≈ 1.2 (positive association, collocation)
Variants and history
PMI introduced in information theory (Fano, 1961); applied to linguistics by Turney (2001). PPMI (Positive PMI) addresses negative bias. Normalized PMI: scale [0, 1] for comparability. T-test, chi-square offer alternative association measures. PMI widely used in collocation detection and semantic analysis.
When to use it
Use PMI when:
- Identifying meaningful collocations (“strong tea” vs. “strong coffee”)
- Reducing noise in co-occurrence matrices
- Weighting term associations for semantic analysis
- Corpus linguistics and phrase extraction
- Measuring term independence
PMI amplifies rare pairs; apply PPMI or smoothing for robustness. Standard practice in computational linguistics.