Damerau-Levenshtein Distance

What it is

Damerau-Levenshtein distance (DLD) extends Levenshtein distance by allowing transposition of adjacent characters as a single edit operation, capturing common typos like “recieve” vs “receive”.

How it works

Two variants exist:

Optimal String Alignment (OSA): restricted to editing each substring once. Allows insertions, deletions, substitutions, and transpositions. Compute via DP similar to Levenshtein but with additional case:

dp[i][j] = min(
  dp[i-1][j] + 1,                    // deletion
  dp[i][j-1] + 1,                    // insertion
  dp[i-1][j-1] + cost,               // substitution
  dp[i-2][j-2] + 1  if a[i-1]==b[j-2] and a[i-2]==b[j-1]  // transposition
)

Unrestricted DLD: allows unrestricted edits. More complex; uses distance between all prefix pairs. Recommended for accuracy.

[illustrate: DP matrix for “algorithm” vs “aglorithm” showing transposition operation]

Example

OSA(“algorithm”, “aglorithm”) = 1 (one transposition: lgo → ogl)

Levenshtein would require 2 edits (e.g., substitute l→o, o→l).

Variants and history

Introduced by Damerau (1964). OSA variant is simpler and widely implemented. Unrestricted variant is slower but more accurate. Used in spell checking, where transpositions represent ~80% of single-error typos in real data.

When to use it

Improve typo tolerance when transpositions are common (keyboard adjacent swaps). Useful in spell checking and OCR. Choose OSA for simplicity and performance; unrestricted DLD for accuracy if needed. Add cost weighting for keyboard proximity or common OCR errors.

See also