Posting
What it is
A posting is the atomic unit of an inverted index. It represents one occurrence relationship: this term appears in this document. A collection of postings for the same term forms a postings list.
In its simplest form, a posting is just a document ID. In richer implementations it also carries:
- Term frequency (TF) — how many times the term appears in the document.
- Position list — the token offsets where the term appears, enabling phrase and proximity queries.
- Payload — arbitrary per-occurrence data (e.g. boosting weights, part-of-speech tags) stored inline with positions.
How it works
When a document is indexed, the analysis chain produces a stream of tokens. For each token, the indexer creates or updates a posting in the term’s postings list:
token "fox" at position 3 in doc D42
→ upsert postings list for "fox"
→ add posting: (docID=42, tf=1, positions=[3])
If “fox” appears again in D42 (say at position 9), the posting is updated in place:
→ update posting: (docID=42, tf=2, positions=[3, 9])
On disk, Lucene-based engines encode postings in variable-length compressed blocks. Document IDs are delta-encoded (each ID stored as the difference from the previous), and positions are similarly delta-encoded within each document. This compression dramatically reduces index size and improves sequential scan performance.
[illustrate: one postings list for the term “fox” — three postings shown as rows: (D7, tf=1, pos=[2]), (D42, tf=2, pos=[3,9]), (D88, tf=1, pos=[0]) — with the docID delta encoding shown below: 7, 35, 46]
Example
After indexing three documents:
- D1:
"the fox jumped over the fox" - D2:
"a quick fox" - D3:
"no foxes here"
Postings for "fox" (after lowercasing, stemming skipped for clarity):
| docID | TF | Positions |
|---|---|---|
| 1 | 2 | [1, 5] |
| 2 | 1 | [2] |
D3 has no posting for "fox" because it contains "foxes" — the exact form "fox" is absent (unless a stemmer is applied, which would conflate them).
Variants and history
The term “posting” comes from library science — a document posted to a heading in a card index. It entered IR vocabulary via the 1975 SMART system and Gerard Salton’s foundational work on vector space retrieval.
Modern variations:
- Impact-ordered postings — postings sorted by TF×IDF score rather than document ID, enabling early termination in top-k retrieval.
- Position-less postings — omit position information to save space when phrase queries are not needed.
- Skip pointers — inserted at fixed intervals in long postings lists to enable O(√n) skipping during boolean intersection.
When to use it
Understanding postings matters most when:
- Tuning index size — disabling position storage (turning off
term_vectorsandindex_options: docs) eliminates position postings, halving index size for fields that don’t need phrase queries. - Debugging relevance — the
explainAPI in Elasticsearch surfaces per-posting TF values, making postings visible in scoring. - Writing custom Lucene code — direct use of
PostingsEnumandLeafReaderrequires understanding posting structure.