Efficient Sketching and Nearest Neighbor Search Algorithms for Sparse Vector Sets

Bruch, Sebastian, Nardini, Franco Maria, Rulli, Cosimo, Venturini, Rossano

Sep-30-2025–arXiv.org Artificial Intelligence

Sparse embeddings of data form an attractive class due to their inherent interpretability: Every dimension is tied to a term in some vocabulary, making it easy to visually decipher the latent space. Sparsity, however, poses unique challenges for Approximate Nearest Neighbor Search (ANNS) which finds, from a collection of vectors, the k vectors closest to a query. To encourage research on this underexplored topic, sparse ANNS featured prominently in a BigANN Challenge at NeurIPS 2023, where approximate algorithms were evaluated on large benchmark datasets by throughput and accuracy. In this work, we introduce a set of novel data structures and algorithmic methods, a combination of which leads to an elegant, effective, and highly efficient solution to sparse ANNS. Our contributions range from a theoretically-grounded sketching algorithm for sparse vectors to reduce their effective dimensionality while preserving inner product-induced ranks; a geometric organization of the inverted index; and the blending of local and global information to improve the efficiency and efficacy of ANNS. Empirically, our final algorithm, dubbed Seismic, reaches sub-millisecond per-query latency with high accuracy on a large-scale benchmark dataset using a single CPU.

information retrieval, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

Sep-30-2025

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.04)
  - United States
    - District of Columbia > Washington (0.04)
    - Texas > Coleman County (0.04)
    - Massachusetts
      - Suffolk County > Boston (0.04)
      - Hampshire County > Northampton (0.04)
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - Idaho > Ada County
      - Boise (0.04)
    - Georgia > Fulton County
      - Atlanta (0.04)
    - Florida > Hillsborough County
      - University (0.40)
- Europe
  - Switzerland (0.04)
  - United Kingdom > North Sea
    - Southern North Sea (0.04)
  - Spain > Galicia
    - Madrid (0.04)
  - Italy
    - Lazio > Rome (0.04)
    - Tuscany > Pisa Province
      - Pisa (0.04)
    - Piedmont > Turin Province
      - Turin (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - Taiwan > Taiwan Province
    - Taipei (0.04)
  - Japan > Honshū
    - Kantō > Tokyo Metropolis Prefecture > Tokyo (0.28)
  - China > Beijing
    - Beijing (0.04)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning > Statistical Learning (0.67)
  - Natural Language
    - Information Retrieval (1.00)
    - Machine Translation (0.67)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found