Billion-scale Similarity Search Using a Hybrid Indexing Approach with Advanced Filtering

Jan-23-2025–arXiv.org Artificial Intelligence

Similarity search, the task of finding similar vectors, has become a fundamental operation in machine learning, with applications in recommendation engines, semantic search systems, and more [1-3]. As datasets grow to billions of entries, the challenge of performing efficient searches on high-dimensional vectors becomes increasingly complex [4]. This is further compounded by the well-known curse of dimensionality [5], which affects the performance and accuracy of search algorithms as the number of dimensions increases. Approximate Nearest Neighbor (ANN) algorithms, such as Inverted File Index (IVF) [6] and Hierarchical Navigable Small World (HNSW) [7], have been developed to address scalability and performance issues. IVF segments the search space into smaller areas, called Voronoi cells [8], while HNSW constructs a navigable graph structure for efficient search space traversal. Despite their advancements, these methods often struggle to support complex, multi-dimensional filtering efficiently. This is crucial in practical scenarios where additional criteria beyond vector similarity are required to refine search results [6]. Examples of such scenarios include e-commerce product search and semantic search with filtering and recommendation systems.

artificial intelligence, machine learning, vector, (19 more...)

arXiv.org Artificial Intelligence

Jan-23-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Alameda County > Berkeley (0.04)
- Europe > Bulgaria
  - Sofia City Province > Sofia (0.04)

Genre:
- Research Report (1.00)

Industry:
- Information Technology > Services (0.34)

Technology:
- Information Technology
  - Information Management > Search (1.00)
  - Artificial Intelligence
    - Machine Learning (1.00)
    - Representation & Reasoning
      - Search (0.89)
      - Personal Assistant Systems (0.54)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found