Okapi BM25

What it is

Okapi BM25 is the original published formulation of the BM25 ranking function, so named for the Okapi information retrieval system at City University London where it was developed. In practice, “Okapi BM25” and “BM25” refer to the same algorithm — the “Okapi” prefix distinguishes it historically from the BM25 variants (BM25F, BM25+) that came later.

How it works

The Okapi BM25 formula is identical to what is described in the BM25 citation. It emerged from the Probability Ranking Principle (Robertson, 1977): rank documents in decreasing order of their estimated probability of being relevant. The BM25 score approximates this probability using term frequency, document length, and collection statistics.

The critical parameters:

  • k₁ (default 1.2–2.0) — controls TF saturation. Higher values allow TF to keep growing; lower values saturate faster.
  • b (default 0.75) — length normalisation strength. b=1 fully normalises by document length; b=0 disables it.

These defaults were validated empirically on the TREC test collections used during the Okapi project.

Variants and history

The Okapi project ran from the late 1980s through the 1990s at City University London, led by Stephen Robertson and Karen Spärck Jones. Key milestones:

  • 1994 — Robertson et al. publish the BM25 formula in TREC-3 proceedings (“Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval”).
  • 1996–99 — Okapi competes in TREC retrieval tracks, establishing BM25 as the dominant algorithm.
  • 2009 — Lucene adopts BM25 as an optional similarity; it becomes the default in Lucene 6 (2016).
  • 2016–present — Elasticsearch, OpenSearch, and Solr all use BM25 as the default scoring function.

The “25” in BM25 reflects its position in a series of Best Match experiments. BM11 and BM15 are earlier, simpler variants in the same family; BM25 was the one that worked best across TREC collections.

When to use it

“Okapi BM25” as a term appears mainly in academic IR literature and when you need to be precise about which BM25 variant you are using. In engineering contexts, “BM25” and “Okapi BM25” are interchangeable.

Use this entry as a disambiguation point: if a paper or system mentions “Okapi BM25” it means the standard BM25 formula with k₁ and b parameters, not BM25F (multi-field) or BM25+ (lower-bound TF).

See also