Bi-Encoder

What it is

A bi-encoder (dual-encoder or siamese architecture) uses two identical or similar neural encoders to independently embed queries and documents into a shared vector space. The separated encoding enables pre-computation of document embeddings and fast nearest-neighbour retrieval at query time, making it ideal for large-scale dense retrieval.

[illustrate: Query and document flowing through separate but identical encoder networks; resulting vectors in shared embedding space; query compared against document vectors]

How it works

  1. Offline indexing:

    • Encode all documents using the encoder network
    • Store embeddings in an ANN index (HNSW, IVF, etc.)
  2. Online retrieval:

    • Encode query using the same encoder
    • Search ANN index to find nearest document embeddings
    • Return top-k documents ranked by embedding similarity
  3. Training:

    • Supervised: fine-tune on (query, relevant_doc, irrelevant_doc) triples
    • Contrastive: maximize similarity for positive pairs, minimize for negatives
    • In-batch negatives: use other batch documents as negatives

Example

# Encoder: BERT or Sentence-Transformer
# Query: "best practices for machine learning"
# → encode to 768-dim vector

# Pre-indexed documents:
# "ML best practices guide" → 768-dim vector (already computed)
# "Deep learning tutorials" → 768-dim vector (already computed)

# Retrieve nearest documents by dot-product similarity
# No need to re-encode documents at query time (fast!)

Variants and history

Siamese networks date to the 1990s for face recognition. Dense Passage Retrieval (DPR, 2020) popularized bi-encoders for information retrieval. Sentence-BERT (SBERT) combined sentence encoders with contrastive objectives. Modern variants include multi-vector bi-encoders (ColBERT-style) and instruction-tuned bi-encoders that adapt to different task descriptions.

When to use it

Choose bi-encoders when:

  • You need sub-100ms retrieval latency on large collections
  • Document set is static or infrequently updated
  • You can afford offline indexing and ANN storage
  • Speed is more important than maximum ranking accuracy
  • You’ll combine with reranking for final ranking

Bi-encoders are the workhorse of production dense retrieval, fast but limited to pre-computed documents. Use cross-encoders or rerankers for fine-grained second-stage ranking.

See also