Approximate Nearest Neighbour

What it is

Approximate Nearest Neighbour (ANN) search quickly retrieves the k closest points to a query vector in high-dimensional space, trading exact precision for dramatic speed gains. For billion-scale vector databases, ANN algorithms make dense retrieval practical by returning near-optimal results in milliseconds rather than seconds.

[illustrate: High-dimensional vector space; exhaustive search (red) vs. ANN search trajectory (blue); speed-accuracy tradeoff curve]

How it works

ANN algorithms trade off recall (find true top-k) for speed through:

  1. Graph-based (HNSW): Nodes connected probabilistically; navigate hierarchical layers
  2. Partitioning-based (IVF): Divide space into Voronoi cells; search nearby cells first
  3. Quantization (PQ, OPQ): Compress vectors to lower precision; use approximate distances
  4. Hashing (LSH): Hash vectors to buckets; retrieve from same buckets

Most algorithms are approximate: they may miss true nearest neighbours but return very good candidates with high probability.

Example

# Exact nearest-neighbour: compare query to all N vectors O(N × d)
# Example: 1B vectors × 768 dims = 768B comparisons (minutes)

# HNSW ANN: navigate graph with log(N) hops O(log N × d)
# Example: 1B vectors → ~30 hops × 768 dims ≈ 23k comparisons (ms)

# Tradeoff: might miss true nearest in 0.01% of queries, but 1000x speedup

Variants and history

LSH (Locality Sensitive Hashing) appeared in the 1990s. IVF (Inverted File) became popular around 2010 for approximate clustering. HNSW (Hierarchical Navigable Small World, Malkov & Yashunin, 2018) dominated the 2020s with its simplicity and strong recall-speed tradeoff. Learned indices combine ANN with learned ranking models. Vector databases (Weaviate, Milvus, Qdrant) built production systems around ANN indexes.

When to use it

Use ANN when:

  • Scaling dense retrieval to millions or billions of vectors
  • Query latency is critical (<100ms required)
  • 1–5% recall loss is acceptable
  • You can afford indexing overhead
  • Working with high-dimensional embeddings (768+)

ANN is essential for production dense search. Trade-offs include memory overhead, tuning complexity (graph degree, search width), and occasional recall loss.

See also