Bi-Encoder
What it is
A bi-encoder (dual-encoder or siamese architecture) uses two identical or similar neural encoders to independently embed queries and documents into a shared vector space. The separated encoding enables pre-computation of document embeddings and fast nearest-neighbour retrieval at query time, making it ideal for large-scale dense retrieval.
[illustrate: Query and document flowing through separate but identical encoder networks; resulting vectors in shared embedding space; query compared against document vectors]
How it works
-
Offline indexing:
- Encode all documents using the encoder network
- Store embeddings in an ANN index (HNSW, IVF, etc.)
-
Online retrieval:
- Encode query using the same encoder
- Search ANN index to find nearest document embeddings
- Return top-k documents ranked by embedding similarity
-
Training:
- Supervised: fine-tune on (query, relevant_doc, irrelevant_doc) triples
- Contrastive: maximize similarity for positive pairs, minimize for negatives
- In-batch negatives: use other batch documents as negatives
Example
# Encoder: BERT or Sentence-Transformer
# Query: "best practices for machine learning"
# → encode to 768-dim vector
# Pre-indexed documents:
# "ML best practices guide" → 768-dim vector (already computed)
# "Deep learning tutorials" → 768-dim vector (already computed)
# Retrieve nearest documents by dot-product similarity
# No need to re-encode documents at query time (fast!)
Variants and history
Siamese networks date to the 1990s for face recognition. Dense Passage Retrieval (DPR, 2020) popularized bi-encoders for information retrieval. Sentence-BERT (SBERT) combined sentence encoders with contrastive objectives. Modern variants include multi-vector bi-encoders (ColBERT-style) and instruction-tuned bi-encoders that adapt to different task descriptions.
When to use it
Choose bi-encoders when:
- You need sub-100ms retrieval latency on large collections
- Document set is static or infrequently updated
- You can afford offline indexing and ANN storage
- Speed is more important than maximum ranking accuracy
- You’ll combine with reranking for final ranking
Bi-encoders are the workhorse of production dense retrieval, fast but limited to pre-computed documents. Use cross-encoders or rerankers for fine-grained second-stage ranking.