Phrase Query

What it is

A phrase query matches documents where specified tokens appear consecutively in order. With zero slop, tokens must be adjacent; with positive slop, intervening tokens are permitted. Slop is the maximum number of token position moves needed to align all terms in order.

How it works

Phrase queries leverage positional indices storing token positions within fields. The query engine:

  1. Retrieves positions of each term from the positional index
  2. Iterates through positions in ascending order
  3. Checks if positions satisfy the slop constraint:
    • Zero slop: positions must be consecutive (pos[i+1] == pos[i] + 1)
    • Positive slop: sum of gaps must be ≤ slop value
  4. Returns documents where alignment succeeds

[illustrate: Position alignment check for phrase “information retrieval” with slop=0 versus slop=2, showing how intervening tokens affect matching]

Example

Query: "information retrieval"

Document 1: “Information retrieval systems rank documents” → Match (slop=0) Document 2: “Information and retrieval techniques” → No match (slop=0) Document 2 with slop=1: Match (one intervening token)

Variants and history

Exact phrase: slop=0. Loose phrase: slop > 0, allowing intervening words. Span queries (Lucene) provide more precise positional matching. PostgreSQL’s <-> operator uses slop-like distance metrics. Query syntax typically quotes phrases: "exact phrase" or uses proximity operators like NEAR/5.

When to use it

Essential for title matching, citation matching, and document similarity. Use slop=0 for exact phrases; increase slop for paraphrases or variable word order. Requires positional index storage, consuming more space than inverted indices alone.

See also