Billion-scale Similarity Search Using a Hybrid Indexing Approach with Advanced Filtering
Emanuilov, Simeon, Dimov, Aleksandar
–arXiv.org Artificial Intelligence
Similarity search, the task of finding similar vectors, has become a fundamental operation in machine learning, with applications in recommendation engines, semantic search systems, and more [1-3]. As datasets grow to billions of entries, the challenge of performing efficient searches on high-dimensional vectors becomes increasingly complex [4]. This is further compounded by the well-known curse of dimensionality [5], which affects the performance and accuracy of search algorithms as the number of dimensions increases. Approximate Nearest Neighbor (ANN) algorithms, such as Inverted File Index (IVF) [6] and Hierarchical Navigable Small World (HNSW) [7], have been developed to address scalability and performance issues. IVF segments the search space into smaller areas, called Voronoi cells [8], while HNSW constructs a navigable graph structure for efficient search space traversal. Despite their advancements, these methods often struggle to support complex, multi-dimensional filtering efficiently. This is crucial in practical scenarios where additional criteria beyond vector similarity are required to refine search results [6]. Examples of such scenarios include e-commerce product search and semantic search with filtering and recommendation systems.
arXiv.org Artificial Intelligence
Jan-23-2025
- Country:
- Europe > Bulgaria
- Sofia City Province > Sofia (0.04)
- North America > United States
- California > Alameda County > Berkeley (0.04)
- Europe > Bulgaria
- Genre:
- Research Report (1.00)
- Industry:
- Information Technology > Services (0.34)
- Technology: