Levenshtein Distance

What it is

Levenshtein distance (LD) is the minimum number of single-character edits (insert, delete, substitute) required to transform one string into another. It is the most widely-used edit distance variant.

How it works

Algorithm (Wagner-Fischer dynamic programming):

  1. Create a matrix where rows represent characters of string A, columns represent string B
  2. Initialise first row and column with indices (edit distance from empty string)
  3. Fill cells iteratively: for each cell (i, j), take minimum of:
    • Cell above + 1 (deletion)
    • Cell to left + 1 (insertion)
    • Cell diagonally above-left + 0 (if characters match) or + 1 (substitution)
  4. Bottom-right cell contains final distance

Time: O(mn), space: O(mn) for strings of length m and n. Optimisable to O(min(m, n)) space with careful iteration.

[illustrate: DP matrix for “algorithm” vs “altruistic” showing completed table with final distance value]

Example

LD(“algorithm”, “altruistic”) = 6

One transformation sequence:

  1. algorithm → altgorithm (substitute: a→a, l→l, g→t)
  2. altgorithm → altruithm (substitute: g→u, o→i)
  3. altruithm → altruistic (substitute: h→s, m→t, insert: c)

Variants and history

Named after Vladimir Levenshtein. Used in spell checkers, record linkage, DNA sequence analysis. Variations include Optimal String Alignment (restricted to non-crossing edits) and weighted variants. Modern implementations use early termination and memoisation for practical performance.

When to use it

Standard metric for typo tolerance in search (fuzzy queries, spell checking). Use when approximate string matching is needed. For long strings, approximate methods (BK-tree lookup, LSH) provide speedup. Not suitable for semantic similarity or long-document matching.

See also