Dense Retrieval
What it is
Dense retrieval finds relevant documents by embedding queries and documents into continuous vector space, then ranking by vector similarity. A query embedding is compared against all (or indexed) document embeddings using nearest-neighbour search. Dense methods capture semantic relatedness beyond keyword matching.
[illustrate: Embedding space with query point (red) surrounded by relevant documents (green) and irrelevant documents (gray); ANN index overlay showing search trajectory]
How it works
-
Offline indexing:
- Encode each document or passage into a dense vector using a shared encoder (e.g., sentence-BERT)
- Store vectors in an approximate nearest-neighbour (ANN) index (HNSW, IVF, etc.)
-
Online retrieval:
- Encode query using the same encoder
- Search ANN index to retrieve top-k nearest neighbours
- Return ranked list of documents
-
Scoring: Usually dot-product similarity or cosine distance between vectors
Example
Query: "benefits of regular exercise"
→ encode to vector q
Documents indexed:
"Physical fitness improves health" → doc_1_vec (similarity: 0.87)
"Exercise and mental wellbeing" → doc_2_vec (similarity: 0.85)
"History of Ancient Rome" → doc_3_vec (similarity: 0.12)
Top-1 result: doc_1
Variants and history
Dense retrieval became practical around 2019–2020 with improvements in encoding efficiency and ANN indexes. Early work (DPR, ColBERT) showed dense retrieval outperforms BM25 on MSMARCO and other benchmarks. Hybrid search combines dense and sparse methods. Modern variants include late-interaction models (ColBERT), multi-representation systems, and instruction-tuned encoders. Matryoshka representation learning reduces storage and search cost.
When to use it
Choose dense retrieval when:
- Semantic matching matters more than exact keywords
- You have resources for ANN indexing
- Query and document distributions are similar
- Latency budgets allow nearest-neighbour search (~10–100ms at scale)
- You want to capture paraphrases and synonymy
Dense methods are slower at query time than inverted-index BM25 and require tuning of embedding models. Hybrid search often balances accuracy and efficiency.