Precision at k

What it is

Precision at k (P@k) measures precision over only the top-k results. Since users typically view only top 10 or 20 results, P@k is more practical than overall precision. P@k = (# relevant in top-k) / k. Common values: P@1, P@5, P@10, P@100. P@k directly reflects user experience with limited result viewing.

[illustrate: Retrieved ranking showing top-k window; relevant items within highlighted; P@k calculation]

How it works

Formula:

P@k = (# relevant documents in top-k) / k

Properties:

Range: 0–1 (or 0–100%)
P@1: is top result relevant? (binary)
P@10: what fraction of top-10 relevant?
P@100: measure of recall at limited depth

Typical cutoffs:

P@1, P@5, P@10: emphasize top results (user-focused)
P@100, P@1000: broader coverage (recall-oriented)

Example

Query: "best machine learning libraries"
Retrieved (top-10):
1. Scikit-learn (relevant)
2. Restaurant (irrelevant)
3. TensorFlow (relevant)
4. Sports news (irrelevant)
5. PyTorch (relevant)
6. Blog post (irrelevant)
7. Keras (relevant)
8. Random article (irrelevant)
9. Numpy (relevant)
10. Forum post (irrelevant)

P@1 = 1/1 = 1.0 (first result relevant)
P@3 = 2/3 = 0.67
P@5 = 3/5 = 0.6
P@10 = 5/10 = 0.5 (half of top-10 relevant)

Mean P@k (standard evaluation):
avg([P@5, P@10]) = (0.6 + 0.5) / 2 = 0.55

Variants and history

P@k is intuitive metric in IR (classical). TREC uses Mean P@k across queries. P@10 cutoff standard in Web IR (typical search result page). Mobile P@3 reflects mobile screen real estate. Modern systems report multiple P@k values (P@1, P@5, P@10, P@100) for comprehensive picture. Complements with MRR (first result), MAP (ranking quality), NDCG (gains).

When to use it

Use P@k when:

User experience with top results matters
Typical user views only top-k results
Precision at specific depth needed
Comparing systems’ early ranking quality
Mobile or constrained-space scenarios

P@k is practical and interpretable but must be combined with other metrics (recall, ranking quality) for holistic evaluation.