Damerau-Levenshtein Distance
What it is
Damerau-Levenshtein distance (DLD) extends Levenshtein distance by allowing transposition of adjacent characters as a single edit operation, capturing common typos like “recieve” vs “receive”.
How it works
Two variants exist:
Optimal String Alignment (OSA): restricted to editing each substring once. Allows insertions, deletions, substitutions, and transpositions. Compute via DP similar to Levenshtein but with additional case:
dp[i][j] = min(
dp[i-1][j] + 1, // deletion
dp[i][j-1] + 1, // insertion
dp[i-1][j-1] + cost, // substitution
dp[i-2][j-2] + 1 if a[i-1]==b[j-2] and a[i-2]==b[j-1] // transposition
)
Unrestricted DLD: allows unrestricted edits. More complex; uses distance between all prefix pairs. Recommended for accuracy.
[illustrate: DP matrix for “algorithm” vs “aglorithm” showing transposition operation]
Example
OSA(“algorithm”, “aglorithm”) = 1 (one transposition: lgo → ogl)
Levenshtein would require 2 edits (e.g., substitute l→o, o→l).
Variants and history
Introduced by Damerau (1964). OSA variant is simpler and widely implemented. Unrestricted variant is slower but more accurate. Used in spell checking, where transpositions represent ~80% of single-error typos in real data.
When to use it
Improve typo tolerance when transpositions are common (keyboard adjacent swaps). Useful in spell checking and OCR. Choose OSA for simplicity and performance; unrestricted DLD for accuracy if needed. Add cost weighting for keyboard proximity or common OCR errors.