Data-Structures
-
Term Vector
A term vector is a per-document record of the terms, frequencies, and optionally positions and offsets produced by index-time analysis. It enables highlighting, More Like This queries, and forward-index access.
-
Stored Field
A stored field retains the original verbatim value of a field in the index so it can be returned in search results. Stored fields are separate from the inverted index and from DocValues.
-
Segment
A segment is an immutable, self-contained unit of a Lucene index. New documents are written to new segments; segments are periodically merged to keep the index efficient.
-
Postings List
A postings list is the ordered sequence of postings for a single term in an inverted index — the list of all documents containing that term, with optional frequencies and positions.
-
Posting
A posting is a single record in an inverted index, linking a term to one document in which it appears — optionally including term frequency and token positions.
-
Positional Index
A positional index extends the inverted index by storing the token position of each term occurrence within a document, enabling phrase queries and proximity queries.
-
Merge Policy
A merge policy defines the rules governing when and how Lucene index segments are merged. Merging controls the tradeoff between indexing throughput, search performance, and disk usage.
-
Forward Index
A forward index maps each document to the list of terms it contains. It is the natural output of document ingestion and the starting point for building an inverted index.
-
Field Type
A field type is a named schema definition that specifies how a field’s values are stored, indexed, and analysed. It bundles an analyzer, storage options, and index behaviour into a reusable configuration.
-
DocValues
DocValues is a column-oriented on-disk data structure in Lucene that stores field values per document, enabling efficient sorting, faceting, and aggregations without loading the entire index into memory.
-
Trie
A trie is a tree where each path from root to node spells out a prefix, enabling O(k) term lookup, prefix enumeration, and autocomplete — where k is the length of the query string.
-
Inverted Index
An inverted index maps each unique term in a corpus to the documents — and optionally the positions — where it appears, making full-text search fast regardless of corpus size.