Efficiency
-
Binary Embeddings
Embeddings compressed to 1-bit per dimension; enables Hamming distance similarity search with integer POPCNT operations, dramatically reducing index size and retrieval latency.
-
PLAID
Performance-optimized Late Interaction Driver; efficient serving engine for ColBERT using centroid-based candidate filtering to avoid full MaxSim computation over the entire index.