Bi-Encoder

Bi-Encoder Encoder Dense-Retrieval Embedding Siamese Needs-Review

What it is

A bi-encoder (dual-encoder or siamese architecture) uses two identical or similar neural encoders to independently embed queries and documents into a shared vector space. The separated encoding enables pre-computation of document embeddings and fast nearest-neighbour retrieval at query time, making it ideal for large-scale dense retrieval.

[illustrate: Query and document flowing through separate but identical encoder networks; resulting vectors in shared embedding space; query compared against document vectors]

How it works

Offline indexing:
- Encode all documents using the encoder network
- Store embeddings in an ANN index (HNSW, IVF, etc.)
Online retrieval:
- Encode query using the same encoder
- Search ANN index to find nearest document embeddings
- Return top-k documents ranked by embedding similarity
Training:
- Supervised: fine-tune on (query, relevant_doc, irrelevant_doc) triples
- Contrastive: maximize similarity for positive pairs, minimize for negatives
- In-batch negatives: use other batch documents as negatives

Example

# Encoder: BERT or Sentence-Transformer
# Query: "best practices for machine learning"
# → encode to 768-dim vector

# Pre-indexed documents:
# "ML best practices guide" → 768-dim vector (already computed)
# "Deep learning tutorials" → 768-dim vector (already computed)

# Retrieve nearest documents by dot-product similarity
# No need to re-encode documents at query time (fast!)

Variants and history

Siamese networks date to the 1990s for face recognition. Dense Passage Retrieval (DPR, 2020) popularized bi-encoders for information retrieval. Sentence-BERT (SBERT) combined sentence encoders with contrastive objectives. Modern variants include multi-vector bi-encoders (ColBERT-style) and instruction-tuned bi-encoders that adapt to different task descriptions.

When to use it

Choose bi-encoders when:

You need sub-100ms retrieval latency on large collections
Document set is static or infrequently updated
You can afford offline indexing and ANN storage
Speed is more important than maximum ranking accuracy
You’ll combine with reranking for final ranking

Bi-encoders are the workhorse of production dense retrieval, fast but limited to pre-computed documents. Use cross-encoders or rerankers for fine-grained second-stage ranking.