DeepImpact

What it is

DeepImpact (Mallia et al., 2021) is a learned sparse retrieval model that assigns importance scores to document terms using a BERT encoder, then stores those scores in a standard inverted index. Unlike SPLADE, DeepImpact does not perform vocabulary expansion — it only rescores the terms already present in the document. This makes it simpler to implement and compatible with any inverted index infrastructure.

[illustrate: Document tokens → BERT → per-token scalar impact scores → sparse posting list with scores; query terms look up impact scores at retrieval time]

How it works

  1. Impact scoring:

    • Encode the document with BERT
    • For each unique term in the document, take the contextualized representation and project to a scalar score
    • Scores are quantized to integers for inverted index storage
  2. No query expansion:

    • Only terms present in the original document are scored
    • Simpler than SPLADE; no FLOPS regularization needed
  3. Indexing:

    • Store (term, impact_score) pairs in a standard inverted index
    • Query terms look up their precomputed impact scores at retrieval time
    • Compatible with Anserini / Lucene infrastructure
  4. Scoring at retrieval time:

    • For each query term, retrieve its impact score from the index
    • Sum impact scores across matched query terms
    • No encoder inference at query time

Variants and history

DeepImpact (2021) was an early demonstration that learned impact scores improve over BM25 without ANN infrastructure. uniCOIL followed with a similar approach but using a single-vector-per-token design. SPLADE extended the concept with vocabulary expansion and showed substantially higher effectiveness at the cost of more complex training. DeepImpact remains a useful baseline for systems that want semantic scoring without index infrastructure changes.

When to use it

Use DeepImpact when:

  • You have a standard inverted index and cannot add ANN infrastructure
  • Query-time latency constraints prevent encoder inference
  • A moderate improvement over BM25 with minimal architectural change is acceptable
  • Vocabulary expansion (SPLADE) is too complex for your setup

See also