Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets
Bruch, Sebastian, Nardini, Franco Maria, Rulli, Cosimo, Venturini, Rossano
–arXiv.org Artificial Intelligence
Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection of vectors, the k vectors closest to a query. To encourage research on this underexplored topic, sparse ANNS featured prominently in a BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on large benchmark datasets by throughput and accuracy. In this work, we introduce a set of novel data structures and algorithmic methods, a combination of which leads to an elegant, effective, and highly efficient solution to sparse ANNS. Our contributions range from a theoretically-grounded sketching algorithm for sparse vectors to reduce their effective dimensionality while preserving inner product-induced ranks; a geometric organization of the inverted index; and the blending of local and global information to improve the efficiency and efficacy of ANNS. Empirically, our final algorithm, dubbed Seismic, reaches sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.
arXiv.org Artificial Intelligence
Sep-30-2025
- Country:
- North America
- Canada (0.04)
- United States
- District of Columbia > Washington (0.04)
- Texas > Coleman County (0.04)
- Massachusetts
- Suffolk County > Boston (0.04)
- Hampshire County > Northampton (0.04)
- Louisiana > Orleans Parish
- New Orleans (0.04)
- Idaho > Ada County
- Boise (0.04)
- Georgia > Fulton County
- Atlanta (0.04)
- Florida > Hillsborough County
- University (0.40)
- Europe
- Switzerland (0.04)
- United Kingdom > North Sea
- Southern North Sea (0.04)
- Spain > Galicia
- Madrid (0.04)
- Italy
- Lazio > Rome (0.04)
- Tuscany > Pisa Province
- Pisa (0.04)
- Piedmont > Turin Province
- Turin (0.04)
- France > Île-de-France
- Asia
- Middle East > Jordan (0.04)
- Taiwan > Taiwan Province
- Taipei (0.04)
- Japan > Honshū
- Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
- China > Beijing
- Beijing (0.04)
- North America
- Genre:
- Research Report (1.00)
- Technology: