KNRM

What it is

KNRM (Kernel-based Neural Ranking Model, Xiong et al., 2017) replaces DRMM’s hard histogram bins with soft RBF (Gaussian) kernels over the query-document term interaction matrix. Each kernel captures soft term matches at a different similarity level, producing a feature vector that is then combined by a learning-to-rank layer. The entire model including word embeddings is trained end-to-end.

[illustrate: Similarity matrix → K Gaussian kernels at different μ values → log-sum per query term per kernel → linear combination → relevance score]

How it works

  1. Interaction matrix:

    • Word embeddings (trained end-to-end) for query and document terms
    • Cosine similarity matrix M[i][j] between query term i and document term j
  2. Kernel pooling:

    • Define K RBF kernels with different means μ_k (e.g., {1.0, 0.9, 0.7, …, -0.9})
    • For query term i: K_k(i) = log(Σ_j exp(-( M[i][j] - μ_k)² / 2σ²))
    • Captures soft matching at different similarity thresholds
  3. Ranking layer:

    • For each query term, concatenate its K kernel scores
    • Sum across query terms, weighted by IDF
    • Linear combination → final score

Variants and history

KNRM (2017) was end-to-end trainable unlike DRMM (which used fixed pre-trained embeddings for the interaction step). Conv-KNRM added n-gram convolutions before the similarity matrix for phrase-level matching. These interaction models were largely superseded by transformer-based models (ColBERT, cross-encoders) but remain relevant for low-resource deployments and interpretability research.

When to use it

KNRM is primarily historical. Consider it when:

  • Transformer infrastructure is unavailable (embedded/edge deployment)
  • Interpretability of the interaction matrix is required
  • Studying the evolution of neural ranking for research

See also