Hamming Distance
What it is
Hamming distance counts the number of positions where two equal-length strings differ. It only allows substitution, making it unsuitable for strings of different lengths but very efficient for fixed-length codes.
How it works
Computation is trivial: compare strings position-by-position, counting mismatches.
distance = sum(a[i] != b[i] for i in range(len(a)))
O(n) time, O(1) space. For binary strings, this is popcount (population count) of the XOR of the two bit sequences, enabling hardware acceleration.
[illustrate: Two equal-length strings aligned with mismatches highlighted]
Example
HD(“kitten”, “mitten”) = 1 (position 0: k ≠ m) HD(“saturday”, “sunday”) = 3 (positions 1, 5, 6 differ)
Variants and history
Developed by Richard Hamming for error-correcting codes. Fundamental in coding theory, cryptography, and bioinformatics. Hamming space refers to the metric space induced by hamming distance on binary strings.
When to use it
Fixed-length strings and codes only. Binary data and bit sequences (XOR-based comparison). DNA sequence comparison (if sequences are pre-aligned). Efficient for large-scale similarity search when string length is constant. Not suitable for variable-length strings or when insertions/deletions matter.