KNRM
What it is
KNRM (Kernel-based Neural Ranking Model, Xiong et al., 2017) replaces DRMM’s hard histogram bins with soft RBF (Gaussian) kernels over the query-document term interaction matrix. Each kernel captures soft term matches at a different similarity level, producing a feature vector that is then combined by a learning-to-rank layer. The entire model including word embeddings is trained end-to-end.
[illustrate: Similarity matrix → K Gaussian kernels at different μ values → log-sum per query term per kernel → linear combination → relevance score]
How it works
-
Interaction matrix:
- Word embeddings (trained end-to-end) for query and document terms
- Cosine similarity matrix M[i][j] between query term i and document term j
-
Kernel pooling:
- Define K RBF kernels with different means μ_k (e.g., {1.0, 0.9, 0.7, …, -0.9})
- For query term i: K_k(i) = log(Σ_j exp(-( M[i][j] - μ_k)² / 2σ²))
- Captures soft matching at different similarity thresholds
-
Ranking layer:
- For each query term, concatenate its K kernel scores
- Sum across query terms, weighted by IDF
- Linear combination → final score
Variants and history
KNRM (2017) was end-to-end trainable unlike DRMM (which used fixed pre-trained embeddings for the interaction step). Conv-KNRM added n-gram convolutions before the similarity matrix for phrase-level matching. These interaction models were largely superseded by transformer-based models (ColBERT, cross-encoders) but remain relevant for low-resource deployments and interpretability research.
When to use it
KNRM is primarily historical. Consider it when:
- Transformer infrastructure is unavailable (embedded/edge deployment)
- Interpretability of the interaction matrix is required
- Studying the evolution of neural ranking for research