Fuzzy Query

What it is

A fuzzy query finds documents containing terms similar to the query term within an edit distance threshold. It tolerates typos, misspellings, and phonetic variations, improving recall on imperfectly-spelled queries.

How it works

Fuzzy queries build an automaton or enumerate candidates from the inverted index:

  1. Convert query term to an edit-distance automaton accepting all strings within specified distance (typically 1 or 2)
  2. Intersect automaton with indexed terms (or enumerate index terms within distance)
  3. Retrieve postings for matching terms
  4. Union results

At index time, some systems precompute ngram indices enabling fast fuzzy matching. At query time, the distance threshold restricts the automaton complexity.

[illustrate: Edit distance automaton for “lucene” with max distance 2, showing accepted terms like “lucen”, “lucenne”, “lucane”]

Example

Query: lucene~1 (fuzzy with distance 1)

Matches: “lucene”, “lucen”, “lucenne”, “luc”, “lucine”, “lucene”, etc.

Retrieves documents containing the exact term “lucene” plus typos within edit distance 1.

Variants and history

Damerau-Levenshtein: includes transpositions for common typos. Prefix fuzziness: aut*~1 allows distance only on prefix. Elasticsearch, Solr, and most modern systems support fuzzy queries with configurable distance. Lucene uses a numeric suffix: term~2.

When to use it

Essential for user-facing search with typo tolerance. Use distance=1 for short terms, distance=2 for longer terms. Adds query latency; not suitable for real-time strict requirements. More forgiving than prefix queries but less efficient.

See also