Ranking
-
Vector Space Model
The vector space model (VSM) represents documents and queries as vectors in a high-dimensional term space and ranks documents by their cosine similarity to the query vector.
-
Reranker
Second-stage model re-scoring a candidate set retrieved by first-stage retrieval; improves ranking quality at modest computational cost.
-
Probabilistic Retrieval Model
Probabilistic retrieval models rank documents by their estimated probability of relevance to a query. BM25 is the most successful probabilistic retrieval model; language models offer an alternative probabilistic framework.
-
Okapi BM25
Okapi BM25 is the original formulation of BM25, developed at City University London on the Okapi IR system in the early 1990s. The name ‘Okapi BM25’ honours the system; in practice it is synonymous with BM25.
-
Mean Reciprocal Rank
Mean of reciprocal rank of first relevant result; measures how quickly system finds first answer. Abbreviated MRR.
-
Learning to Rank
Learning to rank (LTR) trains a model to produce an optimal ordering of documents for a query using labelled relevance data, combining signals such as BM25, click-through rate, and document features.
-
Hybrid Search
Combining dense vector similarity and sparse term-matching scores to balance semantic understanding with keyword precision.
-
Cross-Encoder
Neural architecture jointly encoding query-document pairs for accurate relevance scoring; used for reranking retrieved candidates from first-stage retrieval.
-
Boosting
Adjusts the relevance score contribution of a field, term, or query clause, multiplying base scores to prioritise matches. Essential for ranking tuning.
-
BM25F
BM25F extends BM25 to multi-field documents by weighting each field separately before combining, so title matches can outweigh body matches without simply multiplying the final score.
-
BM25+
BM25+ fixes an edge-case bug in BM25 where long documents containing a rare query term can score lower than shorter documents that don’t contain it at all, by adding a small constant lower-bound to the TF contribution.
-
BM25
BM25 (Best Match 25) is a probabilistic ranking function that scores documents against a query by weighing term frequency and inverse document frequency with length normalisation.
-
TF-IDF
TF-IDF (term frequency–inverse document frequency) is a numerical statistic that reflects how important a word is to a document relative to a corpus, used as a relevance signal in search ranking.