NDCG
What it is
Normalized Discounted Cumulative Gain (NDCG) measures ranking quality by giving credit for relevant results but with diminishing value at lower ranks. NDCG is particularly suitable for ranking tasks (recommendation, search) where the position of relevant items matters. It accounts for graded relevance (not just binary relevant/irrelevant).
[illustrate: DCG calculation showing diminishing gains at lower ranks; comparison of different ranking orders and their DCG/NDCG]
How it works
Discounted Cumulative Gain (DCG):
DCG = Σ (2^rel(i) - 1) / log_2(i + 1) for i = 1 to k
Where rel(i) = relevance grade of result at position i (0/1 for binary, or 0–5 for graded)
Ideal DCG (IDCG):
- DCG for ideal ranking (all relevant items ranked first)
- Varies by query (depends on # relevant items)
NDCG:
NDCG = DCG / IDCG
- Normalized: NDCG ∈ [0, 1]
- Comparable across queries despite different query difficulty
Properties:
- log discount: position 1 = 1, position 2 = 1.58, position 4 = 2.58, position 10 = 3.32
- Positions matter: relevant at rank 1 worth 2x rank 2
- Graded relevance supported
Example
Query: "best machine learning libraries"
Ideal ranking (5 relevant items):
1. Scikit-learn (rel=3)
2. TensorFlow (rel=3)
3. PyTorch (rel=3)
4. Keras (rel=2)
5. Numpy (rel=2)
Actual ranking:
1. Scikit-learn (rel=3)
2. Restaurant (rel=0)
3. TensorFlow (rel=3)
4. Sports (rel=0)
5. PyTorch (rel=3)
DCG = (2^3-1)/log_2(2) + (2^0-1)/log_2(3) + (2^3-1)/log_2(4)
+ (2^0-1)/log_2(5) + (2^3-1)/log_2(6)
= 7/1 + 0/1.58 + 7/2 + 0/2.32 + 7/2.58
= 7 + 0 + 3.5 + 0 + 2.71
= 13.21
IDCG = (2^3-1)/1 + (2^3-1)/1.58 + (2^3-1)/2 + (2^2-1)/2.32 + (2^2-1)/2.58
= 7 + 4.43 + 3.5 + 1.29 + 1.16
= 17.38
NDCG = 13.21 / 17.38 = 0.76
Variants and history
NDCG introduced by Järvelin & Kekäläinen (2000) for IR evaluation. Graded NDCG supports multiple relevance levels. NDCG@k evaluates top-k results (standard for large result sets). Widely adopted in industry (Google, Bing, others) and academia. Learning to Rank optimization often targets NDCG.
When to use it
Use NDCG when:
- Ranking position matters (search results, recommendations)
- Graded relevance (not just binary relevant/irrelevant)
- Comparing systems across queries of varying difficulty
- Top-k results are most important (user views only top results)
- Learning-to-rank systems (optimize for NDCG)
NDCG is more sophisticated than P/R but also more complex to compute and interpret.