Approximate Nearest Neighbour
What it is
Approximate Nearest Neighbour (ANN) search quickly retrieves the k closest points to a query vector in high-dimensional space, trading exact precision for dramatic speed gains. For billion-scale vector databases, ANN algorithms make dense retrieval practical by returning near-optimal results in milliseconds rather than seconds.
[illustrate: High-dimensional vector space; exhaustive search (red) vs. ANN search trajectory (blue); speed-accuracy tradeoff curve]
How it works
ANN algorithms trade off recall (find true top-k) for speed through:
- Graph-based (HNSW): Nodes connected probabilistically; navigate hierarchical layers
- Partitioning-based (IVF): Divide space into Voronoi cells; search nearby cells first
- Quantization (PQ, OPQ): Compress vectors to lower precision; use approximate distances
- Hashing (LSH): Hash vectors to buckets; retrieve from same buckets
Most algorithms are approximate: they may miss true nearest neighbours but return very good candidates with high probability.
Example
# Exact nearest-neighbour: compare query to all N vectors O(N × d)
# Example: 1B vectors × 768 dims = 768B comparisons (minutes)
# HNSW ANN: navigate graph with log(N) hops O(log N × d)
# Example: 1B vectors → ~30 hops × 768 dims ≈ 23k comparisons (ms)
# Tradeoff: might miss true nearest in 0.01% of queries, but 1000x speedup
Variants and history
LSH (Locality Sensitive Hashing) appeared in the 1990s. IVF (Inverted File) became popular around 2010 for approximate clustering. HNSW (Hierarchical Navigable Small World, Malkov & Yashunin, 2018) dominated the 2020s with its simplicity and strong recall-speed tradeoff. Learned indices combine ANN with learned ranking models. Vector databases (Weaviate, Milvus, Qdrant) built production systems around ANN indexes.
When to use it
Use ANN when:
- Scaling dense retrieval to millions or billions of vectors
- Query latency is critical (<100ms required)
- 1–5% recall loss is acceptable
- You can afford indexing overhead
- Working with high-dimensional embeddings (768+)
ANN is essential for production dense search. Trade-offs include memory overhead, tuning complexity (graph degree, search width), and occasional recall loss.