Goto

Collaborating Authors

 vector search


LEANN: A Low-Storage Vector Index

Wang, Yichuan, Li, Zhifei, Liu, Shu, Wu, Yongji, Mao, Ziming, Zhao, Yilong, Yan, Xiao, Xu, Zhiying, Zhou, Yang, Stoica, Ion, Min, Sewon, Zaharia, Matei, Gonzalez, Joseph E.

arXiv.org Artificial Intelligence

Embedding-based vector search underpins many important applications, such as recommendation and retrieval-augmented generation (RAG). It relies on vector indices to enable efficient search. However, these indices require storing high-dimensional embeddings and large index metadata, whose total size can be several times larger than the original data (e.g., text chunks). Such high storage overhead makes it difficult, or even impractical, to deploy vector search on personal devices or large-scale datasets. To tackle this problem, we propose LEANN, a storage-efficient index for vector search that recomputes embeddings on the fly instead of storing them, and compresses state-of-the-art proximity graph indices while preserving search accuracy. LEANN delivers high-quality vector search while using only a fraction of the storage (e.g., 5% of the original data) and supporting storage-efficient index construction and updates. On real-world benchmarks, LEANN reduces index size by up to 50x compared with conventional indices, while maintaining SOTA accuracy and comparable latency for RAG applications.


Path-Constrained Retrieval: A Structural Approach to Reliable LLM Agent Reasoning Through Graph-Scoped Semantic Search

Oladokun, Joseph

arXiv.org Artificial Intelligence

Large Language Model (LLM) agents have shown remarkable capabilities in reasoning and problem-solving when augmented with retrieval mechanisms [1, 2]. However, a critical challenge persists: ensuring that retrieved information maintains logical and structural consistency with the agent's current reasoning context. Traditional retrieval methods, such as vector similarity search, retrieve information based solely on semantic similarity, without considering structural relationships within knowledge bases. This limitation becomes particularly problematic in multi-hop reasoning scenarios, where an agent must traverse a knowledge graph to answer complex queries. When an agent is reasoning about a specific concept (the "anchor"), retrieving information from structurally disconnected parts of the knowledge graph can introduce inconsistencies and contradictions into the reasoning process. For example, if an agent is reasoning about "cloud computing architecture" starting from a specific node, retrieving information about unrelated topics that happen to be semantically similar can lead to incoherent reasoning chains due to lack of structural consistency. We propose Path-Constrained Retrieval (PCR), a retrieval method that enforces structural constraints by restricting the search space to nodes reachable from an anchor node in a knowledge graph.


Text2VectorSQL: Towards a Unified Interface for Vector Search and SQL Queries

Wang, Zhengren, Yao, Dongwen, Li, Bozhou, Ma, Dongsheng, Li, Bo, Li, Zhiyu, Xiong, Feiyu, Cui, Bin, Tang, Linpeng, Zhang, Wentao

arXiv.org Artificial Intelligence

The proliferation of unstructured data poses a fundamental challenge to traditional database interfaces. While Text-to-SQL has democratized access to structured data, it remains incapable of interpreting semantic or multi-modal queries. Concurrently, vector search has emerged as the de facto standard for querying unstructured data, but its integration with SQL-termed VectorSQL-still relies on manual query crafting and lacks standardized evaluation methodologies, creating a significant gap between its potential and practical application. To bridge this fundamental gap, we introduce and formalize Text2VectorSQL, a novel task to establish a unified natural language interface for seamlessly querying both structured and unstructured data. To catalyze research in this new domain, we present a comprehensive foundational ecosystem, including: (1) A scalable and robust pipeline for synthesizing high-quality Text-to-VectorSQL training data. (2) VectorSQLBench, the first large-scale, multi-faceted benchmark for this task, encompassing 12 distinct combinations across three database backends (SQLite, PostgreSQL, ClickHouse) and four data sources (BIRD, Spider, arXiv, Wikipedia). (3) Several novel evaluation metrics designed for more nuanced performance analysis. Extensive experiments not only confirm strong baseline performance with our trained models, but also reveal the recall degradation challenge: the integration of SQL filters with vector search can lead to more pronounced result omissions than in conventional filtered vector search. By defining the core task, delivering the essential data and evaluation infrastructure, and identifying key research challenges, our work lays the essential groundwork to build the next generation of unified and intelligent data interfaces. Our repository is available at https://github.com/OpenDCAI/Text2VectorSQL.


Comparing Lexical and Semantic Vector Search Methods When Classifying Medical Documents

Harris, Lee

arXiv.org Artificial Intelligence

-- Classification is a common AI problem, and vector search is a typical solution. This transforms a given body of text into a numerical representation, known as an embedding, and modern improvements to vector search focus on optimising speed and predictive accuracy. This is often achieved through neural methods that aim to learn language semantics. However, our results suggest that these are not always the best solution. Our task was to classify rigidly-structured medical documents according to their content, and we found that using off-the-shelf semantic vector search produced slightly worse predictive accuracy than creating a bespoke lexical vector search model, and that it required significantly more time to execute. These findings suggest that traditional methods deserve to be contenders in the information retrieval toolkit, despite the prevalence and success of neural models. Matching document terms against an explicit vocabulary (i.e., controlled dictionary or wordlist) is a well-established solution to the document classification (i.e., Automatic Indexing [2]) problem, but as [3] and [4] highlight, using humans to manually create a vocabulary may be costly and error prone.


DO-RAG: A Domain-Specific QA Framework Using Knowledge Graph-Enhanced Retrieval-Augmented Generation

Opoku, David Osei, Sheng, Ming, Zhang, Yong

arXiv.org Artificial Intelligence

Domain-specific QA systems require not just generative fluency but high factual accuracy grounded in structured expert knowledge. While recent Retrieval-Augmented Generation (RAG) frameworks improve context recall, they struggle with integrating heterogeneous data and maintaining reasoning consistency. To address these challenges, we propose DO-RAG, a scalable and customizable hybrid QA framework that integrates multi-level knowledge graph construction with semantic vector retrieval. Our system employs a novel agentic chain-of-thought architecture to extract structured relationships from unstructured, multimodal documents, constructing dynamic knowledge graphs that enhance retrieval precision. At query time, DO-RAG fuses graph and vector retrieval results to generate context-aware responses, followed by hallucination mitigation via grounded refinement. Experimental evaluations in the database and electrical domains show near-perfect recall and over 94% answer relevancy, with DO-RAG outperforming baseline frameworks by up to 33.38%. By combining traceability, adaptability, and performance efficiency, DO-RAG offers a reliable foundation for multi-domain, high-precision QA at scale.


Bang for the Buck: Vector Search on Cloud CPUs

Kuffo, Leonardo, Boncz, Peter

arXiv.org Artificial Intelligence

Vector databases have emerged as a new type of systems that support efficient querying of high-dimensional vectors. Many of these offer their database as a service in the cloud. However, the variety of available CPUs and the lack of vector search benchmarks across CPUs make it difficult for users to choose one. In this study, we show that CPU microarchitectures available in the cloud perform significantly differently across vector search scenarios. For instance, in an IVF index on float32 vectors, AMD's Zen4 gives almost 3x more queries per second (QPS) compared to Intel's Sapphire Rapids, but for HNSW indexes, the tables turn. However, when looking at the number of queries per dollar (QP$), Graviton3 is the best option for most indexes and quantization settings, even over Graviton4 (Table 1). With this work, we hope to guide users in getting the best "bang for the buck" when deploying vector search systems.


Building Scalable AI-Powered Applications with Cloud Databases: Architectures, Best Practices and Performance Considerations

Bhupathi, Santosh

arXiv.org Artificial Intelligence

This paper explores how cloud-native databases enable AI-driven applications by leveraging purpose-built technologies such as vector databases (pgvector), graph databases (AWS Neptune), NoSQL stores (Amazon DocumentDB, DynamoDB), and relational cloud databases (Aurora MySQL and PostgreSQL). It presents architectural patterns for integrating AI workloads with cloud databases, including Retrieval-Augmented Generation (RAG) [1] with LLMs, real-time data pipelines, AI-driven query optimization, and embeddings-based search. Performance benchmarks, scalability considerations, and cost-efficient strategies are evaluated to guide the design of AI-enabled applications. Real-world case studies from industries such as healthcare, finance, and customer experience illustrate how enterprises utilize cloud databases to enhance AI capabilities while ensuring security, governance, and compliance with enterprise and regulatory standards. By providing a comprehensive analysis of AI and cloud database integration, this paper serves as a practical guide for researchers, architects, and enterprises to build next-generation AI applications that optimize performance, scalability, and cost efficiency in cloud environments.


Machine learning and high dimensional vector search

Douze, Matthijs

arXiv.org Artificial Intelligence

Most high-dimensional vector search methods are based on st atistical tools, signal processing approaches or graph traversal algorithms. Statistical tools include random projections [15], dimensionality reduction (PCA and the SVD). Signal processing is employed p rimarily to compress vectors with quantization [30, 4, 22] Most recent indexing methods are rely on graphs [34, 49, 3, 11] that are built with graph traversal heuristics. Vector search (VS) is used in machine learning (ML) for train ing data deduplication [39] and searching ML embeddings [28, 5]. Therefore, there are many r esearch teams around the world that are competent in both fields.


Graph RAG-Tool Fusion

Lumer, Elias, Basavaraju, Pradeep Honaganahalli, Mason, Myles, Burke, James A., Subbiah, Vamse Kumar

arXiv.org Artificial Intelligence

Recent developments in retrieval-augmented generation (RAG) for selecting relevant tools from a tool knowledge base enable LLM agents to scale their complex tool calling capabilities to hundreds or thousands of external tools, APIs, or agents-as-tools. However, traditional RAG-based tool retrieval fails to capture structured dependencies between tools, limiting the retrieval accuracy of a retrieved tool's dependencies. For example, among a vector database of tools, a "get stock price" API requires a "stock ticker" parameter from a "get stock ticker" API, and both depend on OS-level internet connectivity tools. In this paper, we address this limitation by introducing Graph RAG-Tool Fusion, a novel plug-and-play approach that combines the strengths of vector-based retrieval with efficient graph traversal to capture all relevant tools (nodes) along with any nested dependencies (edges) within the predefined tool knowledge graph. We also present ToolLinkOS, a new tool selection benchmark of 573 fictional tools, spanning over 15 industries, each with an average of 6.3 tool dependencies. We demonstrate that Graph RAG-Tool Fusion achieves absolute improvements of 71.7% and 22.1% over na\"ive RAG on ToolLinkOS and ToolSandbox benchmarks, respectively (mAP@10). ToolLinkOS dataset is available at https://github.com/EliasLumer/Graph-RAG-Tool-Fusion-ToolLinkOS


LLM-assisted Vector Similarity Search

Riyadh, Md, Li, Muqi, Lie, Felix Haryanto, Loh, Jia Long, Mi, Haotian, Bohra, Sayam

arXiv.org Artificial Intelligence

As data retrieval demands become increasingly complex, traditional search methods often fall short in addressing nuanced and conceptual queries. Vector similarity search has emerged as a promising technique for finding semantically similar information efficiently. However, its effectiveness diminishes when handling intricate queries with contextual nuances. This paper explores a hybrid approach combining vector similarity search with Large Language Models (LLMs) to enhance search accuracy and relevance. The proposed two-step solution first employs vector similarity search to shortlist potential matches, followed by an LLM for context-aware ranking of the results. Experiments on structured datasets demonstrate that while vector similarity search alone performs well for straightforward queries, the LLM-assisted approach excels in processing complex queries involving constraints, negations, or conceptual requirements. By leveraging the natural language understanding capabilities of LLMs, this method improves the accuracy of search results for complex tasks without sacrificing efficiency. We also discuss real-world applications and propose directions for future research to refine and scale this technique for diverse datasets and use cases. Original article: https://engineering.grab.com/llm-assisted-vector-similarity-search