Pointwise Mutual Information

What it is

Pointwise Mutual Information (PMI) is a measure of association between two terms, quantifying how much their co-occurrence exceeds what would be expected by chance. PMI = log(P(i,j) / (P(i) × P(j))); positive PMI indicates meaningful association (collocation), negative indicates mutual exclusion, zero indicates independence.

[illustrate: Scatter plot of two words with PMI scores; collocations (high PMI) vs. random co-occurrences (low PMI)]

How it works

PMI formula:

PMI(i, j) = log(P(i, j) / (P(i) × P(j)))
          = log(count(i, j) × N / (count(i) × count(j)))

Where:

  • P(i, j): joint probability (co-occurrence frequency)
  • P(i), P(j): marginal probabilities (individual frequencies)
  • N: corpus size

Interpretation:

  • PMI > 0: co-occur more than expected (association)
  • PMI = 0: independent (no association)
  • PMI < 0: co-occur less than expected (mutual exclusion)
  • Unbounded: very rare pairs can have extreme PMI

Issues:

  • Biased toward rare pairs (low frequencies inflate PMI)
  • Positive PMI (PPMI): max(0, PMI) ignores negative PMI

Example

Corpus: 1M tokens

Term pair: ("strong", "tea")
count(strong, tea) = 50
count(strong) = 10,000
count(tea) = 5,000
N = 1M

P(strong, tea) = 50 / 1M = 0.00005
P(strong) = 10k / 1M = 0.01
P(tea) = 5k / 1M = 0.005

PMI = log(0.00005 / (0.01 × 0.005))
    = log(0.00005 / 0.00005)
    = log(1) = 0

# Try ("strong", "coffee"):
count(strong, coffee) = 100  (more co-occurrence)
PMI = log(100/1M / (0.01 × 0.003))
    = log(0.0001 / 0.00003)
    ≈ 1.2  (positive association, collocation)

Variants and history

PMI introduced in information theory (Fano, 1961); applied to linguistics by Turney (2001). PPMI (Positive PMI) addresses negative bias. Normalized PMI: scale [0, 1] for comparability. T-test, chi-square offer alternative association measures. PMI widely used in collocation detection and semantic analysis.

When to use it

Use PMI when:

  • Identifying meaningful collocations (“strong tea” vs. “strong coffee”)
  • Reducing noise in co-occurrence matrices
  • Weighting term associations for semantic analysis
  • Corpus linguistics and phrase extraction
  • Measuring term independence

PMI amplifies rare pairs; apply PPMI or smoothing for robustness. Standard practice in computational linguistics.

See also