Jaro-Winkler Similarity
What it is
Jaro-Winkler extends Jaro similarity by adding a bonus for matching prefixes. Strings that agree on initial characters receive higher scores, improving name matching where prefix agreement is informative.
How it works
Algorithm:
- Compute Jaro similarity (see Jaro)
- Count matching prefix length (up to 4 characters)
- Apply bonus:
jw = jaro + (prefix_length * 0.1 * (1 - jaro))
The prefix bonus increases the score if strings match at the beginning. Maximum prefix bonus is 0.4 (when jaro=0).
[illustrate: Two similar names with matching prefix highlighted; show Jaro vs Jaro-Winkler scores]
Example
JW(“algorithm”, “altruism”):
- Jaro ≈ 0.81 (as computed previously)
- Matching prefix: “al” (length 2)
- Bonus: 2 * 0.1 * (1 - 0.81) = 0.038
- JW ≈ 0.81 + 0.038 ≈ 0.848
Names differing only late in the string benefit from high Jaro; names differing early are penalised less by prefix bonus.
Variants and history
Extended by Winkler (1990) to improve Jaro for name matching. Prefix weight 0.1 is standard. Some implementations allow configurable prefix limit (typically 4). Very popular in record linkage and entity resolution tools.
When to use it
Name matching, record linkage, and duplicate detection. More effective than Jaro alone for names where prefix agreement matters. Suitable for short strings. Less appropriate for long texts where prefix is less informative. Industry standard for entity matching in data integration.