Positional Index
What it is
A positional index is an inverted index that stores not just which documents contain each term, but also where — the token offsets at which each occurrence appears. This position information enables phrase queries ("quick brown fox" — words in exact sequence) and proximity queries ("quick" near "fox" — words within N tokens of each other).
Without positions, only independent term matching is possible; phrase and proximity queries require positional data.
How it works
Each posting in a positional index carries a list of positions:
term "fox":
→ (docID=1, freq=2, positions=[3, 9])
→ (docID=4, freq=1, positions=[0])
A phrase query "brown fox" is evaluated by:
- Fetching the postings lists for
"brown"and"fox". - Finding documents containing both terms (standard AND intersection).
- For each candidate document, checking whether any position of
"fox"is exactly one greater than any position of"brown"— i.e., they are adjacent.
D1: "brown" at [2], "fox" at [3]
Check: 3 == 2 + 1 → YES, phrase matches
A proximity query ("brown fox"~3) relaxes the adjacency requirement to within 3 positions.
Example
Corpus:
- D1:
"the quick brown fox jumped" - D2:
"a brown dog ran away" - D3:
"the fox and the brown bear"
Positional postings for "brown" and "fox":
| Term | Doc | Positions |
|---|---|---|
| brown | D1 | [2] |
| brown | D2 | [1] |
| brown | D3 | [4] |
| fox | D1 | [3] |
| fox | D3 | [1] |
Phrase query "brown fox":
- D1: brown at 2, fox at 3 → 3 == 2+1 → MATCH
- D2: no “fox” → no match
- D3: brown at 4, fox at 1 → 1 ≠ 4+1 → no match
Result: only D1.
Variants and history
Positional indexes were formalised in the IR literature in the 1990s and are described in detail by Manning, Raghavan, and Schütze in Introduction to Information Retrieval (2008). All major search engines (Lucene, Tantivy, Pisa) build positional indexes by default.
Position storage is controlled in Lucene via IndexOptions:
| Setting | Stores |
|---|---|
DOCS |
Document ID only |
DOCS_AND_FREQS |
Doc + term frequency |
DOCS_AND_FREQS_AND_POSITIONS |
Full positional index (default for text fields) |
DOCS_AND_FREQS_AND_POSITIONS_AND_OFFSETS |
Positions + character offsets (for highlighting) |
When to use it
The positional index is enabled by default for text fields in Elasticsearch, OpenSearch, and Solr. Turn it off (index_options: docs in Elasticsearch) only when:
- The field will never be used in phrase or proximity queries.
- Reducing index size is a priority.
- You are indexing high-cardinality keyword-like fields where BM25 scoring is sufficient without phrase support.
Disabling positions reduces index size by approximately 30–40% for average text fields.