Positional Index

What it is

A positional index is an inverted index that stores not just which documents contain each term, but also where — the token offsets at which each occurrence appears. This position information enables phrase queries ("quick brown fox" — words in exact sequence) and proximity queries ("quick" near "fox" — words within N tokens of each other).

Without positions, only independent term matching is possible; phrase and proximity queries require positional data.

How it works

Each posting in a positional index carries a list of positions:

term "fox":
  → (docID=1, freq=2, positions=[3, 9])
  → (docID=4, freq=1, positions=[0])

A phrase query "brown fox" is evaluated by:

Fetching the postings lists for "brown" and "fox".
Finding documents containing both terms (standard AND intersection).
For each candidate document, checking whether any position of "fox" is exactly one greater than any position of "brown" — i.e., they are adjacent.

D1: "brown" at [2], "fox" at [3]
Check: 3 == 2 + 1  → YES, phrase matches

A proximity query ("brown fox"~3) relaxes the adjacency requirement to within 3 positions.

Example

Corpus:

D1: "the quick brown fox jumped"
D2: "a brown dog ran away"
D3: "the fox and the brown bear"

Positional postings for "brown" and "fox":

Term	Doc	Positions
brown	D1	[2]
brown	D2	[1]
brown	D3	[4]
fox	D1	[3]
fox	D3	[1]

Phrase query "brown fox":

D1: brown at 2, fox at 3 → 3 == 2+1 → MATCH
D2: no “fox” → no match
D3: brown at 4, fox at 1 → 1 ≠ 4+1 → no match

Result: only D1.

Variants and history

Positional indexes were formalised in the IR literature in the 1990s and are described in detail by Manning, Raghavan, and Schütze in Introduction to Information Retrieval (2008). All major search engines (Lucene, Tantivy, Pisa) build positional indexes by default.

Position storage is controlled in Lucene via IndexOptions:

Setting	Stores
`DOCS`	Document ID only
`DOCS_AND_FREQS`	Doc + term frequency
`DOCS_AND_FREQS_AND_POSITIONS`	Full positional index (default for `text` fields)
`DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS`	Positions + character offsets (for highlighting)

When to use it

The positional index is enabled by default for text fields in Elasticsearch, OpenSearch, and Solr. Turn it off (index_options: docs in Elasticsearch) only when:

The field will never be used in phrase or proximity queries.
Reducing index size is a priority.
You are indexing high-cardinality keyword-like fields where BM25 scoring is sufficient without phrase support.

Disabling positions reduces index size by approximately 30–40% for average text fields.