Query-time Analysis
What it is
Query-time analysis is the processing applied to a user’s query string before matching it against the inverted index. It transforms the raw query text into tokens using the same (or a compatible) analysis chain used at index time — so that query terms align with the index’s vocabulary.
If index-time analysis produced "fox" from "foxes", query-time analysis must also produce "fox" from a query containing "foxes". Mismatched analysis between index and query time is one of the most common sources of unexpected non-matches.
How it works
The query analyzer runs in the same sequence as the index analyzer: character filters → tokeniser → token filters. The resulting tokens become the terms used for postings list lookups.
Unlike index-time analysis, query-time analysis operates on a short string rather than a document. Some token filters behave differently on short inputs:
- Edge n-gram — should not be applied at query time. If the query
"shoe"is edge-n-grammed, it generates["sh", "sho", "shoe"], causing false matches against tokens that only partially match the query intent. - Synonym injection — can be applied at either index or query time, but query-time synonyms allow updating synonym lists without reindexing.
- Stop word removal — if applied at query time, single-stop-word queries (e.g. a query for just
"the") return no results.
[illustrate: side-by-side comparison of index-time and query-time analysis pipelines — index pipeline includes edge n-gram step, query pipeline does not — showing how a query “running” transforms to [“run”] at query time, matching the stemmed [“run”] in the index without producing prefix tokens]
Example
Field configured with:
- Index analyzer: standard → lowercase → porter2 → edge_ngram(2,8)
- Search analyzer: standard → lowercase → porter2
For document "Running shoes":
- Index terms:
["ru", "run", "sh", "sho", "shoe"]
For query "running":
- Query terms after search analyzer:
["run"] - Lookup: does
"run"appear in the postings list? Yes. → Match.
If the search analyzer also included edge n-gram, the query "running" would produce ["ru", "run"] — matching any document with tokens starting with “ru”, including unrelated results.
Variants and history
The explicit separation of index and search analyzers was introduced in Elasticsearch’s mappings API, where analyzer sets the index-time analyzer and search_analyzer sets the query-time analyzer independently. Solr exposed this distinction earlier via <analyzer type="index"> and <analyzer type="query"> in schema.xml.
Synonym placement is the most consequential design choice between index-time and query-time:
- Index-time synonyms — synonyms are baked into the index; no latency at query time but requires reindexing to update.
- Query-time synonyms — applied fresh each query; synonym lists can be updated without reindexing but adds latency.
Elasticsearch recommends query-time synonyms for flexibility, with synonym_graph filter for accurate multi-token synonym handling.
When to use it
Query-time analysis is modified most often when:
- Updating synonyms — edit the synonym filter in the search analyzer without touching the index.
- Debugging non-matches — use the
_analyzeAPI withanalyzer: <search_analyzer>to see exactly what terms a query produces. - Autocomplete fields — deliberately use a simpler search analyzer (standard + lowercase) to query an index built with edge n-grams.
Always test both index-time and query-time analysis together: use _analyze on a sample document at index time and on a sample query at search time to verify the terms align.