Goto

Collaborating Authors

 Query Processing


A Proposed Large Language Model-Based Smart Search for Archive System

arXiv.org Artificial Intelligence

This study presents a novel framework for smart search in digital archival systems, leveraging the capabilities of Large Language Models (LLMs) to enhance information retrieval. By employing a Retrieval-Augmented Generation (RAG) approach, the framework enables the processing of natural language queries and transforming non-textual data into meaningful textual representations. The system integrates advanced metadata generation techniques, a hybrid retrieval mechanism, a router query engine, and robust response synthesis, the results proved search precision and relevance. We present the architecture and implementation of the system and evaluate its performance in four experiments concerning LLM efficiency, hybrid retrieval optimizations, multilingual query handling, and the impacts of individual components. Obtained results show significant improvements over conventional approaches and have demonstrated the potential of AI-powered systems to transform modern archival practices.


A Survey of Query Optimization in Large Language Models

arXiv.org Artificial Intelligence

\textit{Query Optimization} (QO) refers to techniques aimed at enhancing the efficiency and quality of Large Language Models (LLMs) in understanding and answering queries, especially complex ones in scenarios like Retrieval-Augmented Generation (RAG). Specifically, RAG mitigates the limitations of LLMs by dynamically retrieving and leveraging up-to-date relevant information, which provides a cost-effective solution to the challenge of LLMs producing plausible but potentially inaccurate responses. Recently, as RAG evolves and incorporates multiple components that influence its performance, QO has emerged as a critical element, playing a pivotal role in determining the effectiveness of RAG's retrieval stage in accurately sourcing the necessary multiple pieces of evidence to answer queries correctly. In this paper, we trace the evolution of QO techniques by summarizing and analyzing significant studies. Through an organized framework and categorization, we aim to consolidate existing QO techniques in RAG, elucidate their technological foundations, and highlight their potential to enhance the versatility and applications of LLMs.


Semantic Web: Past, Present, and Future

arXiv.org Artificial Intelligence

Ever since the vision was formulated, the Semantic Web has inspired many generations of innovations. Semantic technologies have been used to share vast amounts of information on the Web, enhance them with semantics to give them meaning, and enable inference and reasoning on them. Throughout the years, semantic technologies, and in particular knowledge graphs, have been used in search engines, data integration, enterprise settings, and machine learning. In this paper, we recap the classical concepts and foundations of the Semantic Web as well as modern and recent concepts and applications, building upon these foundations. The classical topics we cover include knowledge representation, creating and validating knowledge on the Web, reasoning and linking, and distributed querying. We enhance this classical view of the so-called ``Semantic Web Layer Cake'' with an update of recent concepts that include provenance, security and trust, as well as a discussion of practical impacts from industry-led contributions. We conclude with an outlook on the future directions of the Semantic Web.


Learning Low Degree Hypergraphs

arXiv.org Artificial Intelligence

We study the problem of learning a hypergraph via edge detecting queries. In this problem, a learner queries subsets of vertices of a hidden hypergraph and observes whether these subsets contain an edge or not. In general, learning a hypergraph with $m$ edges of maximum size $d$ requires $\Omega((2m/d)^{d/2})$ queries. In this paper, we aim to identify families of hypergraphs that can be learned without suffering from a query complexity that grows exponentially in the size of the edges. We show that hypermatchings and low-degree near-uniform hypergraphs with $n$ vertices are learnable with poly$(n)$ queries. For learning hypermatchings (hypergraphs of maximum degree $ 1$), we give an $O(\log^3 n)$-round algorithm with $O(n \log^5 n)$ queries. We complement this upper bound by showing that there are no algorithms with poly$(n)$ queries that learn hypermatchings in $o(\log \log n)$ adaptive rounds. For hypergraphs with maximum degree $\Delta$ and edge size ratio $\rho$, we give a non-adaptive algorithm with $O((2n)^{\rho \Delta+1}\log^2 n)$ queries. To the best of our knowledge, these are the first algorithms with poly$(n, m)$ query complexity for learning non-trivial families of hypergraphs that have a super-constant number of edges of super-constant size.


Query pipeline optimization for cancer patient question answering systems

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) mitigates hallucination in Large Language Models (LLMs) by using query pipelines to retrieve relevant external information and grounding responses in retrieved knowledge. However, query pipeline optimization for cancer patient question-answering (CPQA) systems requires separately optimizing multiple components with domain-specific considerations. We propose a novel three-aspect optimization approach for the RAG query pipeline in CPQA systems, utilizing public biomedical databases like PubMed and PubMed Central. Our optimization includes: (1) document retrieval, utilizing a comparative analysis of NCBI resources and introducing Hybrid Semantic Real-time Document Retrieval (HSRDR); (2) passage retrieval, identifying optimal pairings of dense retrievers and rerankers; and (3) semantic representation, introducing Semantic Enhanced Overlap Segmentation (SEOS) for improved contextual understanding. On a custom-developed dataset tailored for cancer-related inquiries, our optimized RAG approach improved the answer accuracy of Claude-3-haiku by 5.24% over chain-of-thought prompting and about 3% over a naive RAG setup. This study highlights the importance of domain-specific query optimization in realizing the full potential of RAG and provides a robust framework for building more accurate and reliable CPQA systems, advancing the development of RAG-based biomedical systems.


SynthCypher: A Fully Synthetic Data Generation Framework for Text-to-Cypher Querying in Knowledge Graphs

arXiv.org Artificial Intelligence

Cypher, the query language for Neo4j graph databases, plays a critical role in enabling graph-based analytics and data exploration. While substantial research has been dedicated to natural language to SQL query generation (Text2SQL), the analogous problem for graph databases referred to as Text2Cypher remains underexplored. In this work, we introduce SynthCypher, a fully synthetic and automated data generation pipeline designed to address this gap. SynthCypher employs a novel LLMSupervised Generation-Verification framework, ensuring syntactically and semantically correct Cypher queries across diverse domains and query complexities. Using this pipeline, we create SynthCypher Dataset, a large-scale benchmark containing 29.8k Text2Cypher instances. Fine-tuning open-source large language models (LLMs), including LLaMa-3.1- 8B, Mistral-7B, and QWEN-7B, on SynthCypher yields significant performance improvements of up to 40% on the Text2Cypher test set and 30% on the SPIDER benchmark adapted for graph databases. This work demonstrates that high-quality synthetic data can effectively advance the state-of-the-art in Text2Cypher tasks.


BioRAGent: A Retrieval-Augmented Generation System for Showcasing Generative Query Expansion and Domain-Specific Search for Scientific Q&A

arXiv.org Artificial Intelligence

We present BioRAGent, an interactive web-based retrieval-augmented generation (RAG) system for biomedical question answering. The system uses large language models (LLMs) for query expansion, snippet extraction, and answer generation while maintaining transparency through citation links to the source documents and displaying generated queries for further editing. Building on our successful participation in the BioASQ 2024 challenge, we demonstrate how few-shot learning with LLMs can be effectively applied for a professional search setting. The system supports both direct short paragraph style responses and responses with inline citations. Our demo is available online, and the source code is publicly accessible through GitHub.


Specifications: The missing link to making the development of LLM systems an engineering discipline

arXiv.org Artificial Intelligence

Despite the significant strides made by generative AI in just a few short years, its future progress is constrained by the challenge of building modular and robust systems. This capability has been a cornerstone of past technological revolutions, which relied on combining components to create increasingly sophisticated and reliable systems. Cars, airplanes, computers, and software consist of components-such as engines, wheels, CPUs, and libraries-that can be assembled, debugged, and replaced. A key tool for building such reliable and modular systems is specification: the precise description of the expected behavior, inputs, and outputs of each component. However, the generality of LLMs and the inherent ambiguity of natural language make defining specifications for LLM-based components (e.g., agents) both a challenging and urgent problem. In this paper, we discuss the progress the field has made so far-through advances like structured outputs, process supervision, and test-time compute-and outline several future directions for research to enable the development of modular and reliable LLM-based systems through improved specifications.


Dynamic Prompt Allocation and Tuning for Continual Test-Time Adaptation

arXiv.org Artificial Intelligence

Continual test-time adaptation (CTTA) has recently emerged to adapt a pre-trained source model to continuously evolving target distributions, which accommodates the dynamic nature of real-world environments. To mitigate the risk of catastrophic forgetting in CTTA, existing methods typically incorporate explicit regularization terms to constrain the variation of model parameters. However, they cannot fundamentally resolve catastrophic forgetting because they rely on a single shared model to adapt across all target domains, which inevitably leads to severe inter-domain interference. In this paper, we introduce learnable domain-specific prompts that guide the model to adapt to corresponding target domains, thereby partially disentangling the parameter space of different domains. In the absence of domain identity for target samples, we propose a novel dynamic Prompt AllocatIon aNd Tuning (PAINT) method, which utilizes a query mechanism to dynamically determine whether the current samples come from a known domain or an unexplored one. For known domains, the corresponding domain-specific prompt is directly selected, while for previously unseen domains, a new prompt is allocated. Prompt tuning is subsequently performed using mutual information maximization along with structural regularization. Extensive experiments on three benchmark datasets demonstrate the effectiveness of our PAINT method for CTTA. We have released our code at https://github.com/Cadezzyr/PAINT.


Multimodal Contextualized Support for Enhancing Video Retrieval System

arXiv.org Artificial Intelligence

Current video retrieval systems, especially those used in competitions, primarily focus on querying individual keyframes or images rather than encoding an entire clip or video segment. However, queries often describe an action or event over a series of frames, not a specific image. This results in insufficient information when analyzing a single frame, leading to less accurate query results. Moreover, extracting embeddings solely from images (keyframes) does not provide enough information for models to encode higher-level, more abstract insights inferred from the video. These models tend to only describe the objects present in the frame, lacking a deeper understanding. In this work, we propose a system that integrates the latest methodologies, introducing a novel pipeline that extracts multimodal data, and incorporate information from multiple frames within a video, enabling the model to abstract higher-level information that captures latent meanings, focusing on what can be inferred from the video clip, rather than just focusing on object detection in one single image.