Neural-Ir
-
ANCE
Approximate Nearest Neighbor Negative Contrastive Estimation; improves dense retrieval training by dynamically refreshing hard negatives from the current model’s ANN index.
-
Atlas
Few-shot retrieval-augmented language model combining FiD reader with Contriever retriever; jointly fine-tuned to achieve strong few-shot performance on knowledge-intensive tasks.
-
BEIR
Benchmarking IR; heterogeneous benchmark of 18 retrieval datasets spanning 9 domains to evaluate zero-shot generalization of retrieval models trained on MS MARCO.
-
Binary Embeddings
Embeddings compressed to 1-bit per dimension; enables Hamming distance similarity search with integer POPCNT operations, dramatically reducing index size and retrieval latency.
-
ColBERTv2
Improved ColBERT with cross-encoder distillation and residual compression; dramatically reduces index size while matching or exceeding v1 effectiveness.
-
Contrastive Loss
Training objective that pulls similar pairs together and pushes dissimilar pairs apart in embedding space; the dominant loss function for dense retrieval and sentence embedding models.
-
Contriever
Unsupervised dense retrieval model trained with contrastive learning on unlabeled text; no labeled query-passage pairs required.
-
DeepImpact
Learns per-term impact scores for documents using a BERT encoder, enabling semantic-aware scoring with a standard inverted index without query expansion.
-
DocT5Query
Document expansion via T5 query generation; generates synthetic queries a document might answer and appends them to the document before indexing, improving sparse retrieval recall.
-
DPR (Dense Passage Retrieval)
Dual BERT encoder model that retrieves passages by embedding queries and documents into a shared dense vector space; foundational bi-encoder for open-domain QA.
-
DRMM
Deep Relevance Matching Model (2016); interaction-based neural ranker using histogram-based local interaction features with term gating, designed explicitly for relevance matching rather than semantic similarity.
-
DSI (Differentiable Search Index)
Encodes an entire document corpus into a single seq2seq model; retrieval is performed by generating document identifiers directly from a query, without a separate index.
-
DSSM
Deep Structured Semantic Model (2013); the original neural dual-encoder for web search, using word-hash trigram inputs and MLP towers to learn query-document semantic similarity.
-
DUET
Dual network combining local (exact-match) and distributed (semantic) sub-models for relevance ranking; one of the first models to explicitly combine lexical and semantic signals.
-
FiD (Fusion-in-Decoder)
Encodes each retrieved passage independently with T5, then fuses all passage representations in the decoder; more scalable than concatenating all passages as a single long input.
-
Hard Negative Mining
Strategy for selecting training negatives that are difficult for the current model to distinguish from positives; critical for dense retrieval model quality beyond in-batch random negatives.
-
HyDE
Hypothetical Document Embeddings; generates a hypothetical answer to a query using an LLM and embeds that instead of the original query for zero-shot dense retrieval improvement.
-
In-Batch Negatives
Training technique where other (query, passage) pairs within the same mini-batch serve as negatives; free negative supervision that scales with batch size.
-
Knowledge Distillation for IR
Training a fast bi-encoder (student) to mimic the ranking scores of a slow cross-encoder (teacher); the dominant approach for improving dense retrieval without cross-encoder latency.
-
KNRM
Kernel-based Neural Ranking Model (2017); uses RBF kernels over the query-document term similarity matrix to produce soft-count features, end-to-end trainable including word embeddings.
-
Listwise Ranking Loss
Ranking loss functions that optimize over the entire ranked list rather than individual pairs or points; includes LambdaLoss, ListNet, ApproxNDCG, and softmax cross-entropy.
-
LLM Rerankers (RankGPT)
Zero-shot document reranking using large language models prompted to produce a relevance-ordered permutation of candidate passages; no fine-tuning required.
-
MonoBERT
BERT-based pointwise reranker that concatenates query and passage for joint encoding; the standard baseline for neural reranking on MS MARCO.
-
MonoT5
T5-based pointwise reranker that generates “true”/“false” tokens to score relevance; more efficient than MonoBERT and generalizes well across domains.
-
MS MARCO
Microsoft MAchine Reading COmprehension dataset; the dominant benchmark for passage retrieval and document ranking with 8.8M passages, 1M training queries, and sparse binary relevance judgments.
-
PACRR
Position-Aware Convolutional-Recurrent Relevance (2017); captures positional and phrase-level query-document interactions via convolutions over the similarity matrix.
-
PLAID
Performance-optimized Late Interaction Driver; efficient serving engine for ColBERT using centroid-based candidate filtering to avoid full MaxSim computation over the entire index.
-
Query2Doc
Expands queries by prepending LLM-generated pseudo-documents before retrieval; improves both sparse and dense retrieval without modifying the index or retrieval model.
-
RankT5
T5-based listwise reranker that directly optimizes ranking metrics by generating ordered document IDs; addresses exposure bias in pointwise and pairwise approaches.
-
REALM
Retrieval-Augmented Language Model Pretraining; jointly trains a retriever and language model by backpropagating through retrieval during masked language modeling pretraining.
-
SimCSE
Simple Contrastive Sentence Embeddings; learns high-quality sentence representations via dropout-based augmentation (unsupervised) or NLI entailment pairs (supervised).
-
TAS-B
Topic-Aware Sampling with BERT; dense retrieval model trained via balanced topic-aware sampling and cross-encoder distillation, achieving strong recall with efficient inference.
-
Two-Stage Retrieval
Retrieve-then-rerank pipeline where a fast first-stage retriever (BM25 or bi-encoder) produces a candidate set, which a slower but more accurate reranker (cross-encoder) then orders.
-
uniCOIL
Uniform COntextualized Inverted List; assigns a single scalar weight per token using a BERT encoder, bridging dense contextualization and sparse inverted index retrieval.