IVF Index
What it is
IVF (Inverted File) is a partitioning-based approximate nearest-neighbour method that divides the vector space into regions (Voronoi cells) and inverts the structure to map centroids to vectors. At query time, only vectors in nearby cells are examined, dramatically reducing computation.
[illustrate: 2D vector space with Voronoi partition (cell boundaries); query point (red) with search radius (circle); candidate cells highlighted]
How it works
-
Offline indexing:
- Run k-means clustering on vectors to find k centroids
- Assign each vector to nearest centroid
- Create inverted file: centroid → [vector_ids, vectors]
- Optional: sort vectors by distance to centroid for early termination
-
Online search:
- Find nearest nprobe centroids to query (typically 1–10)
- Retrieve all vectors from these cells
- Compute exact distances and return top-k
-
Parameters:
- nlist: number of centroids (thousands to millions)
- nprobe: centroids to search at query time (speed-recall tradeoff)
Example
# Index 1M vectors with IVF(nlist=1024, nprobe=20)
# k-means partitions space into 1024 cells
# Average ~1000 vectors per cell
# Query: find 10 nearest neighbours
# Find 20 nearest centroids to query
# Retrieve ~20k vectors from those cells (20 × 1k)
# Compute exact distances, return top-10
# Latency: ~10ms (vs. 1000ms for exhaustive search)
Variants and history
IVF emerged in the 2000s as a practical method for billion-scale retrieval. Modern variants include IVF-Flat (no quantization), IVF-PQ (combined with product quantization for compression), IVF-HNSW (hierarchical cell search), and GPU-accelerated IVF (fast distance computation). IVF remains popular for its speed and simplicity; HNSW has gained ground due to better recall-speed tradeoff.
When to use it
Choose IVF when:
- You need fast indexing (faster than HNSW)
- GPU acceleration is available
- Memory is tight (combined with PQ)
- Your vector distribution is relatively uniform
- nprobe tuning for recall-latency tradeoff suits your system
IVF is faster to build and often faster on GPUs than HNSW, but HNSW typically achieves better recall at the same latency on CPUs.