Wildcard Query

What it is

A wildcard query accepts glob patterns where * matches zero or more characters and ? matches exactly one character. It retrieves all indexed terms matching the pattern, then returns documents containing those terms.

How it works

Wildcard queries compile patterns to finite automata or use ngram indices:

  1. Parse pattern into a state machine accepting matching strings
  2. Scan inverted index (or ngram structure) to enumerate matching terms
  3. Retrieve and merge postings for all matching terms

Leading wildcards (*term) are expensive because they require scanning many terms. Most systems optimise by refusing leading wildcards or requiring minimum prefix length before wildcards. Ngram indices (e.g., edge-ngrams) enable efficient pattern matching.

[illustrate: State machine for pattern “in?ormation*” and which terms it accepts]

Example

Query: author:jo* (finds “jones”, “johnson”, “jolie”) Query: title:informat?on (finds “information”, “informatión”)

Variants and history

Prefix wildcards: term* only. Suffix wildcards: *term. Infix wildcards: te*m. Lucene syntax: * and ?. PostgreSQL LIKE syntax: % and _. Regex is more powerful but slower.

When to use it

Useful for known partial terms or parametric searches. Less powerful than regex but faster. Problematic at scale with leading wildcards. Prefer prefix queries for prefix matching due to better index support.

See also