NDCG

What it is

Normalized Discounted Cumulative Gain (NDCG) measures ranking quality by giving credit for relevant results but with diminishing value at lower ranks. NDCG is particularly suitable for ranking tasks (recommendation, search) where the position of relevant items matters. It accounts for graded relevance (not just binary relevant/irrelevant).

[illustrate: DCG calculation showing diminishing gains at lower ranks; comparison of different ranking orders and their DCG/NDCG]

How it works

Discounted Cumulative Gain (DCG):

DCG = Σ (2^rel(i) - 1) / log_2(i + 1)  for i = 1 to k

Where rel(i) = relevance grade of result at position i (0/1 for binary, or 0–5 for graded)

Ideal DCG (IDCG):

DCG for ideal ranking (all relevant items ranked first)
Varies by query (depends on # relevant items)

NDCG:

NDCG = DCG / IDCG

Normalized: NDCG ∈ [0, 1]
Comparable across queries despite different query difficulty

Properties:

log discount: position 1 = 1, position 2 = 1.58, position 4 = 2.58, position 10 = 3.32
Positions matter: relevant at rank 1 worth 2x rank 2
Graded relevance supported

Example

Query: "best machine learning libraries"
Ideal ranking (5 relevant items):
  1. Scikit-learn (rel=3)
  2. TensorFlow (rel=3)
  3. PyTorch (rel=3)
  4. Keras (rel=2)
  5. Numpy (rel=2)

Actual ranking:
  1. Scikit-learn (rel=3)
  2. Restaurant (rel=0)
  3. TensorFlow (rel=3)
  4. Sports (rel=0)
  5. PyTorch (rel=3)

DCG = (2^3-1)/log_2(2) + (2^0-1)/log_2(3) + (2^3-1)/log_2(4)
        + (2^0-1)/log_2(5) + (2^3-1)/log_2(6)
    = 7/1 + 0/1.58 + 7/2 + 0/2.32 + 7/2.58
    = 7 + 0 + 3.5 + 0 + 2.71
    = 13.21

IDCG = (2^3-1)/1 + (2^3-1)/1.58 + (2^3-1)/2 + (2^2-1)/2.32 + (2^2-1)/2.58
     = 7 + 4.43 + 3.5 + 1.29 + 1.16
     = 17.38

NDCG = 13.21 / 17.38 = 0.76

Variants and history

NDCG introduced by Järvelin & Kekäläinen (2000) for IR evaluation. Graded NDCG supports multiple relevance levels. NDCG@k evaluates top-k results (standard for large result sets). Widely adopted in industry (Google, Bing, others) and academia. Learning to Rank optimization often targets NDCG.

When to use it

Use NDCG when:

Ranking position matters (search results, recommendations)
Graded relevance (not just binary relevant/irrelevant)
Comparing systems across queries of varying difficulty
Top-k results are most important (user views only top results)
Learning-to-rank systems (optimize for NDCG)

NDCG is more sophisticated than P/R but also more complex to compute and interpret.