Jaro-Winkler Similarity

What it is

Jaro-Winkler extends Jaro similarity by adding a bonus for matching prefixes. Strings that agree on initial characters receive higher scores, improving name matching where prefix agreement is informative.

How it works

Algorithm:

  1. Compute Jaro similarity (see Jaro)
  2. Count matching prefix length (up to 4 characters)
  3. Apply bonus: jw = jaro + (prefix_length * 0.1 * (1 - jaro))

The prefix bonus increases the score if strings match at the beginning. Maximum prefix bonus is 0.4 (when jaro=0).

[illustrate: Two similar names with matching prefix highlighted; show Jaro vs Jaro-Winkler scores]

Example

JW(“algorithm”, “altruism”):

  • Jaro ≈ 0.81 (as computed previously)
  • Matching prefix: “al” (length 2)
  • Bonus: 2 * 0.1 * (1 - 0.81) = 0.038
  • JW ≈ 0.81 + 0.038 ≈ 0.848

Names differing only late in the string benefit from high Jaro; names differing early are penalised less by prefix bonus.

Variants and history

Extended by Winkler (1990) to improve Jaro for name matching. Prefix weight 0.1 is standard. Some implementations allow configurable prefix limit (typically 4). Very popular in record linkage and entity resolution tools.

When to use it

Name matching, record linkage, and duplicate detection. More effective than Jaro alone for names where prefix agreement matters. Suitable for short strings. Less appropriate for long texts where prefix is less informative. Industry standard for entity matching in data integration.

See also