Postings List
What it is
A postings list is the complete record for one term in an inverted index. It contains, in sorted order by document ID, every document in which the term appears. Each entry in the list is a posting.
The postings list is the primary structure retrieved during query evaluation. When you search for "fox", the engine looks up the postings list for "fox" and returns the matching documents — potentially intersecting or unioning multiple postings lists for multi-term queries.
How it works
Postings lists are stored in sorted order by document ID. This ordering is critical: it enables efficient set operations (intersection for AND queries, union for OR queries) using a merge algorithm that runs in O(n) time.
Boolean AND — two-pointer merge:
list A: [2, 5, 11, 17]
list B: [3, 5, 9, 11, 22]
advance both pointers until both point to the same ID:
→ intersection: [5, 11]
Boolean OR — merge all unique IDs:
→ union: [2, 3, 5, 9, 11, 17, 22]
On disk, Lucene encodes postings lists using a combination of:
- Delta encoding — each doc ID stored as the gap from the previous (e.g. [2, 5, 11] → [2, 3, 6]).
- Variable-byte encoding — small numbers take fewer bytes.
- Frame of Reference (FOR) / PFOR — block-level compression of 128-posting chunks.
Skip pointers at regular intervals allow the merge algorithm to jump ahead in a long list without scanning every entry — important for long lists (stop words, very common terms) being intersected with short ones.
[illustrate: two postings lists for “quick” and “fox” side by side — sorted doc IDs shown as labelled boxes — two-pointer AND merge animated step by step, landing on the shared IDs [D5, D11]]
Example
Corpus of five documents:
| DocID | Text |
|---|---|
| 1 | “the quick brown fox” |
| 2 | “a quick brown dog” |
| 3 | “the fox ran away” |
| 4 | “a dog and a fox” |
| 5 | “quick quick quick” |
Postings lists (no stemming, no stopwords):
| Term | Postings list |
|---|---|
quick |
[1, 2, 5] |
fox |
[1, 3, 4] |
brown |
[1, 2] |
dog |
[2, 4] |
Query "quick AND fox" → intersect [1, 2, 5] ∩ [1, 3, 4] → [1]
Variants and history
The postings list structure was central to early IR systems including the Cranfield experiments (1960s) and Gerard Salton’s SMART system (1970s). The data structure and its compression techniques were formalised by Zobel & Moffat in their influential 2006 survey.
Notable variants:
- Impact-ordered lists — sorted by term weight (TF×IDF) rather than document ID; enables early termination in top-k BM25 retrieval (WAND and BMW algorithms).
- Elias-Fano coding — a succinct representation supporting random access and next-greater-than queries in O(1) amortised time.
- Roaring bitmaps — used in some systems for short-range postings lists where bitset operations outperform merge algorithms.
When to use it
The postings list is an internal implementation detail in all major search engines — you interact with it indirectly via query types and index settings:
- Boolean queries — directly evaluate postings list intersections/unions.
- Phrase queries — require positional postings lists; disabling positions (
index_options: docs) saves space but disables phrase queries. - Long postings lists (stop words) — may be pruned using minimum term frequency thresholds, or handled with skip pointers.
Understanding postings list length is useful for query performance tuning: intersecting a 10-posting list with a 10-million-posting list is fast (skip pointers), but unioning two 10-million lists is slow.