Posting

What it is

A posting is the atomic unit of an inverted index. It represents one occurrence relationship: this term appears in this document. A collection of postings for the same term forms a postings list.

In its simplest form, a posting is just a document ID. In richer implementations it also carries:

  • Term frequency (TF) — how many times the term appears in the document.
  • Position list — the token offsets where the term appears, enabling phrase and proximity queries.
  • Payload — arbitrary per-occurrence data (e.g. boosting weights, part-of-speech tags) stored inline with positions.

How it works

When a document is indexed, the analysis chain produces a stream of tokens. For each token, the indexer creates or updates a posting in the term’s postings list:

token "fox" at position 3 in doc D42
  → upsert postings list for "fox"
  → add posting: (docID=42, tf=1, positions=[3])

If “fox” appears again in D42 (say at position 9), the posting is updated in place:

  → update posting: (docID=42, tf=2, positions=[3, 9])

On disk, Lucene-based engines encode postings in variable-length compressed blocks. Document IDs are delta-encoded (each ID stored as the difference from the previous), and positions are similarly delta-encoded within each document. This compression dramatically reduces index size and improves sequential scan performance.

[illustrate: one postings list for the term “fox” — three postings shown as rows: (D7, tf=1, pos=[2]), (D42, tf=2, pos=[3,9]), (D88, tf=1, pos=[0]) — with the docID delta encoding shown below: 7, 35, 46]

Example

After indexing three documents:

  • D1: "the fox jumped over the fox"
  • D2: "a quick fox"
  • D3: "no foxes here"

Postings for "fox" (after lowercasing, stemming skipped for clarity):

docID TF Positions
1 2 [1, 5]
2 1 [2]

D3 has no posting for "fox" because it contains "foxes" — the exact form "fox" is absent (unless a stemmer is applied, which would conflate them).

Variants and history

The term “posting” comes from library science — a document posted to a heading in a card index. It entered IR vocabulary via the 1975 SMART system and Gerard Salton’s foundational work on vector space retrieval.

Modern variations:

  • Impact-ordered postings — postings sorted by TF×IDF score rather than document ID, enabling early termination in top-k retrieval.
  • Position-less postings — omit position information to save space when phrase queries are not needed.
  • Skip pointers — inserted at fixed intervals in long postings lists to enable O(√n) skipping during boolean intersection.

When to use it

Understanding postings matters most when:

  • Tuning index size — disabling position storage (turning off term_vectors and index_options: docs) eliminates position postings, halving index size for fields that don’t need phrase queries.
  • Debugging relevance — the explain API in Elasticsearch surfaces per-posting TF values, making postings visible in scoring.
  • Writing custom Lucene code — direct use of PostingsEnum and LeafReader requires understanding posting structure.

See also