Neural-Ir

ANCE

Approximate Nearest Neighbor Negative Contrastive Estimation; improves dense retrieval training by dynamically refreshing hard negatives from the current model’s ANN index.
Atlas

Few-shot retrieval-augmented language model combining FiD reader with Contriever retriever; jointly fine-tuned to achieve strong few-shot performance on knowledge-intensive tasks.
BEIR

Benchmarking IR; heterogeneous benchmark of 18 retrieval datasets spanning 9 domains to evaluate zero-shot generalization of retrieval models trained on MS MARCO.
Binary Embeddings

Embeddings compressed to 1-bit per dimension; enables Hamming distance similarity search with integer POPCNT operations, dramatically reducing index size and retrieval latency.
ColBERTv2

Improved ColBERT with cross-encoder distillation and residual compression; dramatically reduces index size while matching or exceeding v1 effectiveness.
Contrastive Loss

Training objective that pulls similar pairs together and pushes dissimilar pairs apart in embedding space; the dominant loss function for dense retrieval and sentence embedding models.
Contriever

Unsupervised dense retrieval model trained with contrastive learning on unlabeled text; no labeled query-passage pairs required.
DeepImpact

Learns per-term impact scores for documents using a BERT encoder, enabling semantic-aware scoring with a standard inverted index without query expansion.
DocT5Query

Document expansion via T5 query generation; generates synthetic queries a document might answer and appends them to the document before indexing, improving sparse retrieval recall.
DPR (Dense Passage Retrieval)

Dual BERT encoder model that retrieves passages by embedding queries and documents into a shared dense vector space; foundational bi-encoder for open-domain QA.
DRMM

Deep Relevance Matching Model (2016); interaction-based neural ranker using histogram-based local interaction features with term gating, designed explicitly for relevance matching rather than semantic similarity.
DSI (Differentiable Search Index)

Encodes an entire document corpus into a single seq2seq model; retrieval is performed by generating document identifiers directly from a query, without a separate index.
DSSM

Deep Structured Semantic Model (2013); the original neural dual-encoder for web search, using word-hash trigram inputs and MLP towers to learn query-document semantic similarity.
DUET

Dual network combining local (exact-match) and distributed (semantic) sub-models for relevance ranking; one of the first models to explicitly combine lexical and semantic signals.
FiD (Fusion-in-Decoder)

Encodes each retrieved passage independently with T5, then fuses all passage representations in the decoder; more scalable than concatenating all passages as a single long input.
Hard Negative Mining

Strategy for selecting training negatives that are difficult for the current model to distinguish from positives; critical for dense retrieval model quality beyond in-batch random negatives.
HyDE

Hypothetical Document Embeddings; generates a hypothetical answer to a query using an LLM and embeds that instead of the original query for zero-shot dense retrieval improvement.
In-Batch Negatives

Training technique where other (query, passage) pairs within the same mini-batch serve as negatives; free negative supervision that scales with batch size.
Knowledge Distillation for IR

Training a fast bi-encoder (student) to mimic the ranking scores of a slow cross-encoder (teacher); the dominant approach for improving dense retrieval without cross-encoder latency.
KNRM

Kernel-based Neural Ranking Model (2017); uses RBF kernels over the query-document term similarity matrix to produce soft-count features, end-to-end trainable including word embeddings.
Listwise Ranking Loss

Ranking loss functions that optimize over the entire ranked list rather than individual pairs or points; includes LambdaLoss, ListNet, ApproxNDCG, and softmax cross-entropy.
LLM Rerankers (RankGPT)

Zero-shot document reranking using large language models prompted to produce a relevance-ordered permutation of candidate passages; no fine-tuning required.
MonoBERT

BERT-based pointwise reranker that concatenates query and passage for joint encoding; the standard baseline for neural reranking on MS MARCO.
MonoT5

T5-based pointwise reranker that generates “true”/“false” tokens to score relevance; more efficient than MonoBERT and generalizes well across domains.
MS MARCO

Microsoft MAchine Reading COmprehension dataset; the dominant benchmark for passage retrieval and document ranking with 8.8M passages, 1M training queries, and sparse binary relevance judgments.
PACRR

Position-Aware Convolutional-Recurrent Relevance (2017); captures positional and phrase-level query-document interactions via convolutions over the similarity matrix.
PLAID

Performance-optimized Late Interaction Driver; efficient serving engine for ColBERT using centroid-based candidate filtering to avoid full MaxSim computation over the entire index.
Query2Doc

Expands queries by prepending LLM-generated pseudo-documents before retrieval; improves both sparse and dense retrieval without modifying the index or retrieval model.
RankT5

T5-based listwise reranker that directly optimizes ranking metrics by generating ordered document IDs; addresses exposure bias in pointwise and pairwise approaches.
REALM

Retrieval-Augmented Language Model Pretraining; jointly trains a retriever and language model by backpropagating through retrieval during masked language modeling pretraining.
SimCSE

Simple Contrastive Sentence Embeddings; learns high-quality sentence representations via dropout-based augmentation (unsupervised) or NLI entailment pairs (supervised).
TAS-B

Topic-Aware Sampling with BERT; dense retrieval model trained via balanced topic-aware sampling and cross-encoder distillation, achieving strong recall with efficient inference.
Two-Stage Retrieval

Retrieve-then-rerank pipeline where a fast first-stage retriever (BM25 or bi-encoder) produces a candidate set, which a slower but more accurate reranker (cross-encoder) then orders.
uniCOIL

Uniform COntextualized Inverted List; assigns a single scalar weight per token using a BERT encoder, bridging dense contextualization and sparse inverted index retrieval.