Proximity Query

What it is

A proximity query retrieves documents where two or more terms appear within a specified maximum distance of each other, but not necessarily in order or contiguously. Distance is measured in token positions.

How it works

Unlike phrase queries requiring order, proximity queries:

  1. Retrieve positions of all query terms from the positional index
  2. For each pair of terms, compute the minimum distance across all position occurrences
  3. Accept a document if minimum distance ≤ specified maximum

Distance is typically computed as min(|pos_a - pos_b|) across all occurrences. Order is irrelevant; “lucene elasticsearch” matches “elasticsearch is lucene-based” at distance ≤ 3.

[illustrate: Two documents with query terms at different distances; show which satisfies distance threshold and why]

Example

Query: lucene NEAR/5 performance

Document 1: “Lucene performance is excellent” → Match (distance=1) Document 2: “Lucene is a search library. Performance benchmarks are here.” → No match (distance > 5)

Variants and history

Window queries: all terms within a fixed token window. NEAR operator: SQL-like syntax specifying distance. Proximity slop: similar to phrase slop but unordered. PostgreSQL, SQL Server, and Lucene span queries support proximity semantics.

When to use it

Relevant when terms should be topically near each other but order is unimportant. Common in patent and legal search where “invention device” and “device invention” are equivalent. Requires positional indices. More flexible than phrase queries; less precise than exact phrases.

See also