Hard Negative Mining
What it is
Hard negative mining is the practice of selecting training negatives that score highly under the current retrieval model but are not actually relevant — the “confusing” cases that provide the strongest learning signal. Easy negatives (random passages) contribute little gradient once the model has learned basic semantics. Hard negatives force the model to make finer distinctions.
[illustrate: Easy negative far from query in embedding space (small gradient); hard negative close to query but not positive (large gradient, strong learning signal)]
How it works
Static BM25 negatives
- Retrieve top-k passages via BM25 for each training query
- Remove known positives
- Remaining are hard lexically (share query terms) but not necessarily hard semantically
- Used by DPR; cheap to compute once
Dynamic ANN negatives (ANCE-style)
- Train for N steps → re-encode corpus → rebuild ANN index
- Retrieve top-k passages from current model’s index
- Remove positives → dense hard negatives
- Repeat cycle
- Expensive but provides the hardest negatives relative to current model state
Cross-encoder mined negatives
- Score candidates with a cross-encoder teacher
- Use high-scoring but non-relevant passages as negatives
- Very high quality but requires cross-encoder inference over large candidate sets
False negative problem
Hard negatives from retrieval can be false negatives: relevant passages not annotated as positive in the training set. Large-scale datasets (MS MARCO) have this problem. Denoising approaches use a cross-encoder to filter out likely true positives from the negative set.
Example
Query: "what causes earthquakes?"
Positive: "Earthquakes occur at tectonic plate boundaries where..."
BM25 top-5 (negatives after removing positive):
"Earthquakes can be measured using the Richter scale..." ← hard: related topic
"The 1906 San Francisco earthquake killed 3000 people..." ← hard: same entity
"Tsunamis are often triggered by undersea earthquakes..." ← medium
"Plate tectonics describes the movement of Earth's..." ← hard: causal mechanism
Dynamic ANN negative (ANCE):
"Volcanic eruptions release pressure from magma chambers" ← very hard: model
currently ranks this
above the positive
When to use it
- Always start with in-batch negatives; they’re free
- Add BM25 hard negatives as a cheap improvement
- Use dynamic ANCE-style mining when recall@k is the primary metric and compute is available
- Use cross-encoder mined negatives for highest quality with labeled data budget