atomic query
LevelRAG: Enhancing Retrieval-Augmented Generation with Multi-hop Logic Planning over Rewriting Augmented Searchers
Zhang, Zhuocheng, Feng, Yang, Zhang, Min
Retrieval-Augmented Generation (RAG) is a crucial method for mitigating hallucinations in Large Language Models (LLMs) and integrating external knowledge into their responses. Existing RAG methods typically employ query rewriting to clarify the user intent and manage multi-hop logic, while using hybrid retrieval to expand search scope. However, the tight coupling of query rewriting to the dense retriever limits its compatibility with hybrid retrieval, impeding further RAG performance improvements. To address this challenge, we introduce a high-level searcher that decomposes complex queries into atomic queries, independent of any retriever-specific optimizations. Additionally, to harness the strengths of sparse retrievers for precise keyword retrieval, we have developed a new sparse searcher that employs Lucene syntax to enhance retrieval accuracy.Alongside web and dense searchers, these components seamlessly collaborate within our proposed method, \textbf{LevelRAG}. In LevelRAG, the high-level searcher orchestrates the retrieval logic, while the low-level searchers (sparse, web, and dense) refine the queries for optimal retrieval. This approach enhances both the completeness and accuracy of the retrieval process, overcoming challenges associated with current query rewriting techniques in hybrid retrieval scenarios. Empirical experiments conducted on five datasets, encompassing both single-hop and multi-hop question answering tasks, demonstrate the superior performance of LevelRAG compared to existing RAG methods. Notably, LevelRAG outperforms the state-of-the-art proprietary model, GPT4o, underscoring its effectiveness and potential impact on the RAG field.
TimelineQA: A Benchmark for Question Answering over Timelines
Tan, Wang-Chiew, Dwivedi-Yu, Jane, Li, Yuliang, Mathias, Lambert, Saeidi, Marzieh, Yan, Jing Nathan, Halevy, Alon Y.
Lifelogs are descriptions of experiences that a person had during their life. Lifelogs are created by fusing data from the multitude of digital services, such as online photos, maps, shopping and content streaming services. Question answering over lifelogs can offer personal assistants a critical resource when they try to provide advice in context. However, obtaining answers to questions over lifelogs is beyond the current state of the art of question answering techniques for a variety of reasons, the most pronounced of which is that lifelogs combine free text with some degree of structure such as temporal and geographical information. We create and publicly release TimelineQA1, a benchmark for accelerating progress on querying lifelogs. TimelineQA generates lifelogs of imaginary people. The episodes in the lifelog range from major life episodes such as high school graduation to those that occur on a daily basis such as going for a run. We describe a set of experiments on TimelineQA with several state-of-the-art QA models. Our experiments reveal that for atomic queries, an extractive QA system significantly out-performs a state-of-the-art retrieval-augmented QA system. For multi-hop queries involving aggregates, we show that the best result is obtained with a state-of-the-art table QA technique, assuming the ground truth set of episodes for deriving the answer is available.
Complex Query Answering with Neural Link Predictors
Arakelyan, Erik, Daza, Daniel, Minervini, Pasquale, Cochez, Michael
Neural link predictors are immensely useful for identifying missing edges in large scale Knowledge Graphs. However, it is still not clear how to use these models for answering more complex queries that arise in a number of domains, such as queries using logical conjunctions, disjunctions, and existential quantifiers, while accounting for missing edges. In this work, we propose a framework for efficiently answering complex queries on incomplete Knowledge Graphs. We translate each query into an end-to-end differentiable objective, where the truth value of each atom is computed by a pre-trained neural link predictor. We then analyse two solutions to the optimisation problem, including gradient-based and combinatorial search. In our experiments, the proposed approach produces more accurate results than state-of-the-art methods -- black-box neural models trained on millions of generated queries -- without the need of training on a large and diverse set of complex queries. Using orders of magnitude less training data, we obtain relative improvements ranging from 8% up to 40% in Hits@3 across different knowledge graphs containing factual information. Finally, we demonstrate that it is possible to explain the outcome of our model in terms of the intermediate solutions identified for each of the complex query atoms.
ON CLOSED WORLD DATA BASES / 119
ABSTRACT Deductive question-answering systems generally evaluate queries under one of two possible assumptions which we in this paper refer to as the open and closed world assumptions. The open world assumption corresponds to the usual first order approach to query evaluation: Given a data base DB and a query Q, the only answers to Q are those which obtain from proofs of Q given DB as hypotheses. Under the closed world assumption, certain answers are admitted as a result of failure to find a proof. More specifically, if no proof of a positive ground literal exists, then the negation of that literal is assumed true. In this paper, we show that closed world evaluation of an arbitrary query may be reduced to open world evaluation of socalled atomic queries. We then show that the closed world assumption can lead to inconsistencies, but for Horn data bases no such inconsistencies can arise. Finally, we show how for Horn data bases under the closed world assumption purely negative clauses are irrelevant for deductive retrieval and function instead as integrity constraints. INTRODUCTION Deductive question-answering systems generally evaluate queries under one of two possible assumptions which we in this paper refer to as the open and closed world assumptions.
First-Order Rewritability of Atomic Queries in Horn Description Logics
Bienvenu, Meghyn (CNRS and Université Paris Sud) | Lutz, Carsten (University of Bremen) | Wolter, Frank (University of Liverpool)
One of the most advanced approaches to querying data in the presence of ontologies is to make use of relational database systems, rewriting the original query and the ontology into a new query that is formulated in SQL or, equivalently, in first-order logic (FO). For ontologies written in many standard description logics (DLs), however, such FO-rewritings are not guaranteed to exist. We study FO-rewritings and their existence for a basic class of queries and for ontologies formulated in Horn DLs such as Horn-SHI and EL. Our results include characterizations of the existence of FO-rewritings, tight complexity bounds for deciding whether an FO-rewriting exists (ExpTime and PSpace), and tight bounds on the (worst-case) size of FO-rewritings, when presented as a union of conjunctive queries.
On closed world data bases
We have introduced the notion of the closed world assumption for deductive question-answering. This says, in effect, "Every positive statement that you don't know to be true may be assumed false". We have then shown how query evaluation under the closed world assumption reduces to the usual first order proof theoretic approach to query evaluation as applied to atomic queries. Finally, we have shown that consistent Horn data bases remain consistent under the closed world assumption and that definite data bases are consistent with the closed world assumption. ACKNOWLEDGMENT This paper was written with the financial support of the National Research Council of Canada under grant A7642. Much of this research was done while the author was visiting at Bolt, Beranek and Newman, Inc., Cambridge, Mass. I wish to thank Craig Bishop for his careful criticism of an earlier draft of this paper.