Incremental IVF Index Maintenance for Streaming Vector Search

Mohoney, Jason, Pacaci, Anil, Chowdhury, Shihabur Rahman, Minhas, Umar Farooq, Pound, Jeffery, Renggli, Cedric, Reyhani, Nima, Ilyas, Ihab F., Rekatsinas, Theodoros, Venkataraman, Shivaram

Nov-1-2024–arXiv.org Artificial Intelligence

The prevalence of vector similarity search in modern machine IVF indexes out-of-the-box do not have the notion of inserting learning applications and the continuously changing nature of data new vectors or deleting existing vectors once constructed. Indeed, processed by these applications necessitate efficient and effective the most common method used by practitioners today is to rebuild index maintenance techniques for vector search indexes. Designed the index from scratch to reflect any updates that have accumulated primarily for static workloads, existing vector search indexes degrade over time. However, depending on the scale of the vector in search quality and performance as the underlying data is dataset and the volume and frequency of updates, a full index rebuild updated unless costly index reconstruction is performed. To address can be prohibitively expensive. For example, it takes multiple this, we introduce Ada-IVF, an incremental indexing methodology days to rebuild an IVF index from scratch for billion-scale vector for Inverted File (IVF) indexes. Ada-IVF consists of 1) an adaptive datasets [21, 69], making it necessary to revisit how updates can maintenance policy that decides which index partitions are problematic be reflected. Devising such an update mechanism consists of readjusting for performance and should be repartitioned and 2) a local the partitioning of the high-dimensional space defined by re-clustering mechanism that determines how to repartition them.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Nov-1-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - District of Columbia > Washington (0.04)
  - Wisconsin > Dane County
    - Madison (0.04)
  - New York > New York County
    - New York City (0.04)
  - New Jersey > Atlantic County
    - Atlantic City (0.04)
  - Massachusetts > Suffolk County
    - Boston (0.14)
  - Colorado > Denver County
    - Denver (0.04)
  - California
    - San Francisco County > San Francisco (0.14)
    - Los Angeles County > Long Beach (0.04)
- Asia > Taiwan
  - Taiwan Province > Taipei (0.04)

Genre:
- Overview (0.67)
- Research Report (0.64)

Technology:
- Information Technology
  - Information Management > Search (1.00)
  - Data Science (1.00)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Machine Learning > Statistical Learning (0.95)
    - Natural Language (0.93)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found