Hard Negative Mining

What it is

Hard negative mining is the practice of selecting training negatives that score highly under the current retrieval model but are not actually relevant — the “confusing” cases that provide the strongest learning signal. Easy negatives (random passages) contribute little gradient once the model has learned basic semantics. Hard negatives force the model to make finer distinctions.

[illustrate: Easy negative far from query in embedding space (small gradient); hard negative close to query but not positive (large gradient, strong learning signal)]

How it works

Static BM25 negatives

Retrieve top-k passages via BM25 for each training query
Remove known positives
Remaining are hard lexically (share query terms) but not necessarily hard semantically
Used by DPR; cheap to compute once

Dynamic ANN negatives (ANCE-style)

Train for N steps → re-encode corpus → rebuild ANN index
Retrieve top-k passages from current model’s index
Remove positives → dense hard negatives
Repeat cycle
Expensive but provides the hardest negatives relative to current model state

Cross-encoder mined negatives

Score candidates with a cross-encoder teacher
Use high-scoring but non-relevant passages as negatives
Very high quality but requires cross-encoder inference over large candidate sets

False negative problem

Hard negatives from retrieval can be false negatives: relevant passages not annotated as positive in the training set. Large-scale datasets (MS MARCO) have this problem. Denoising approaches use a cross-encoder to filter out likely true positives from the negative set.

Example

Query: "what causes earthquakes?"
Positive: "Earthquakes occur at tectonic plate boundaries where..."

BM25 top-5 (negatives after removing positive):
  "Earthquakes can be measured using the Richter scale..."  ← hard: related topic
  "The 1906 San Francisco earthquake killed 3000 people..." ← hard: same entity
  "Tsunamis are often triggered by undersea earthquakes..." ← medium
  "Plate tectonics describes the movement of Earth's..."    ← hard: causal mechanism

Dynamic ANN negative (ANCE):
  "Volcanic eruptions release pressure from magma chambers" ← very hard: model
                                                             currently ranks this
                                                             above the positive

When to use it

Always start with in-batch negatives; they’re free
Add BM25 hard negatives as a cheap improvement
Use dynamic ANCE-style mining when recall@k is the primary metric and compute is available
Use cross-encoder mined negatives for highest quality with labeled data budget