String-Metrics
-
Levenshtein Distance
Edit distance allowing insertions, deletions, and substitutions. Canonical metric for string similarity and typo tolerance.
-
Jaro-Winkler Similarity
Jaro similarity with prefix bonus for matching initial characters. Improves accuracy for name and record matching.
-
Jaro Similarity
String similarity metric for short strings based on matching characters and transpositions. Commonly used in record linkage and data quality.
-
Hamming Distance
Number of positions at which two equal-length strings differ. Efficient metric for fixed-length codes and binary data.
-
Edit Distance
Minimum number of single-character operations (insertions, deletions, substitutions) to transform one string into another. Foundation for similarity metrics.
-
Damerau-Levenshtein Distance
Edit distance including transpositions (swapping adjacent characters). Captures more common typos than Levenshtein alone.