Positional Index

What it is

A positional index is an inverted index that stores not just which documents contain each term, but also where — the token offsets at which each occurrence appears. This position information enables phrase queries ("quick brown fox" — words in exact sequence) and proximity queries ("quick" near "fox" — words within N tokens of each other).

Without positions, only independent term matching is possible; phrase and proximity queries require positional data.

How it works

Each posting in a positional index carries a list of positions:

term "fox":
  → (docID=1, freq=2, positions=[3, 9])
  → (docID=4, freq=1, positions=[0])

A phrase query "brown fox" is evaluated by:

  1. Fetching the postings lists for "brown" and "fox".
  2. Finding documents containing both terms (standard AND intersection).
  3. For each candidate document, checking whether any position of "fox" is exactly one greater than any position of "brown" — i.e., they are adjacent.
D1: "brown" at [2], "fox" at [3]
Check: 3 == 2 + 1  → YES, phrase matches

A proximity query ("brown fox"~3) relaxes the adjacency requirement to within 3 positions.

Example

Corpus:

  • D1: "the quick brown fox jumped"
  • D2: "a brown dog ran away"
  • D3: "the fox and the brown bear"

Positional postings for "brown" and "fox":

Term Doc Positions
brown D1 [2]
brown D2 [1]
brown D3 [4]
fox D1 [3]
fox D3 [1]

Phrase query "brown fox":

  • D1: brown at 2, fox at 3 → 3 == 2+1 → MATCH
  • D2: no “fox” → no match
  • D3: brown at 4, fox at 1 → 1 ≠ 4+1 → no match

Result: only D1.

Variants and history

Positional indexes were formalised in the IR literature in the 1990s and are described in detail by Manning, Raghavan, and Schütze in Introduction to Information Retrieval (2008). All major search engines (Lucene, Tantivy, Pisa) build positional indexes by default.

Position storage is controlled in Lucene via IndexOptions:

Setting Stores
DOCS Document ID only
DOCS_AND_FREQS Doc + term frequency
DOCS_AND_FREQS_AND_POSITIONS Full positional index (default for text fields)
DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS Positions + character offsets (for highlighting)

When to use it

The positional index is enabled by default for text fields in Elasticsearch, OpenSearch, and Solr. Turn it off (index_options: docs in Elasticsearch) only when:

  • The field will never be used in phrase or proximity queries.
  • Reducing index size is a priority.
  • You are indexing high-cardinality keyword-like fields where BM25 scoring is sufficient without phrase support.

Disabling positions reduces index size by approximately 30–40% for average text fields.

See also