Fuzzy Query
What it is
A fuzzy query finds documents containing terms similar to the query term within an edit distance threshold. It tolerates typos, misspellings, and phonetic variations, improving recall on imperfectly-spelled queries.
How it works
Fuzzy queries build an automaton or enumerate candidates from the inverted index:
- Convert query term to an edit-distance automaton accepting all strings within specified distance (typically 1 or 2)
- Intersect automaton with indexed terms (or enumerate index terms within distance)
- Retrieve postings for matching terms
- Union results
At index time, some systems precompute ngram indices enabling fast fuzzy matching. At query time, the distance threshold restricts the automaton complexity.
[illustrate: Edit distance automaton for “lucene” with max distance 2, showing accepted terms like “lucen”, “lucenne”, “lucane”]
Example
Query: lucene~1 (fuzzy with distance 1)
Matches: “lucene”, “lucen”, “lucenne”, “luc”, “lucine”, “lucene”, etc.
Retrieves documents containing the exact term “lucene” plus typos within edit distance 1.
Variants and history
Damerau-Levenshtein: includes transpositions for common typos. Prefix fuzziness: aut*~1 allows distance only on prefix. Elasticsearch, Solr, and most modern systems support fuzzy queries with configurable distance. Lucene uses a numeric suffix: term~2.
When to use it
Essential for user-facing search with typo tolerance. Use distance=1 for short terms, distance=2 for longer terms. Adds query latency; not suitable for real-time strict requirements. More forgiving than prefix queries but less efficient.