AITopics | single-hop question

Collaborating Authors

single-hop question

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VDocRAG: Retrieval-Augmented Generation over Visually-Rich Documents

Tanaka, Ryota, Iki, Taichi, Hasegawa, Taku, Nishida, Kyosuke, Saito, Kuniko, Suzuki, Jun

arXiv.org Artificial IntelligenceApr-15-2025

W e aim to develop a retrieval-augmented generation (RAG) framework that answers questions over a corpus of visually-rich documents presented in mixed modalities (e.g., charts, tables) and diverse formats (e.g., PDF, PPTX). In this paper, we introduce a new RAG framework, VDocRAG, which can directly understand varied documents and modalities in a unified image format to prevent missing information that occurs by parsing documents to obtain text. T o improve the performance, we propose novel self-supervised pre-training tasks that adapt large vision-language models for retrieval by compressing visual information into dense token representations while aligning them with textual content in documents. Furthermore, we introduce OpenDocVQA, the first unified collection of open-domain document visual question answering datasets, encompassing diverse document types and formats. OpenDocVQA provides a comprehensive resource for training and evaluating retrieval and question answering models on visually-rich documents in an open-domain setting. Experiments show that VDocRAG substantially outperforms conventional text-based RAG and has strong generalization capability, highlighting the potential of an effective RAG paradigm for real-world documents.

large language model, machine learning, question answering, (21 more...)

arXiv.org Artificial Intelligence

2504.09795

Country:

Europe > Denmark (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
Europe > Germany (0.04)
(3 more...)

Genre: Research Report (0.50)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)

Add feedback

BRIEF: Bridging Retrieval and Inference for Multi-hop Reasoning via Compression

Li, Yuankai, Gu, Jia-Chen, Wu, Di, Chang, Kai-Wei, Peng, Nanyun

arXiv.org Artificial IntelligenceOct-20-2024

Retrieval-augmented generation (RAG) can supplement large language models (LLMs) by integrating external knowledge. However, as the number of retrieved documents increases, the input length to LLMs grows linearly, causing a dramatic increase in latency and a degradation in long-context understanding. This is particularly serious for multi-hop questions that require a chain of reasoning across documents. To accelerate inference, reduce costs, and minimize distractions, this paper presents BRIEF (Bridging Retrieval and Inference through Evidence Fusion), a lightweight approach that performs query-aware multi-hop reasoning by compressing retrieved documents into highly dense textual summaries to integrate into in-context learning. To enable learning compression for multi-hop reasoning, we curate synthetic data by extracting atomic proposition expressions that encapsulate distinct factoids from the source documents to compose synthetic summaries. Based on our synthetic data built entirely by open-source models, BRIEF generates more concise summaries and enables a range of LLMs to achieve exceptional open-domain question answering (QA) performance. For example, on HotpotQA, BRIEF improves the compression rate by 2 times compared to the state-of-the-art baseline, while outperforming it by 3.00% EM and 4.16% F1 with Flan-UL2 as the reader LM. It also generates more concise summaries than proprietary GPT-3.5, while demonstrating nearly identical QA performance.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2410.15277

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
Asia > Singapore (0.05)
(14 more...)

Genre: Research Report > New Finding (0.46)

Industry:

Leisure & Entertainment (1.00)
Consumer Products & Services > Hotels (1.00)
Government (0.93)
Media > Film (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)

Add feedback

What are the Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets? Insights and Best Practices

Chen, Zhi, Chen, Qiguang, Qin, Libo, Guo, Qipeng, Lv, Haijun, Zou, Yicheng, Che, Wanxiang, Yan, Hang, Chen, Kai, Lin, Dahua

arXiv.org Artificial IntelligenceSep-3-2024

Recent advancements in large language models (LLMs) with extended context windows have significantly improved tasks such as information extraction, question answering, and complex planning scenarios. In order to achieve success in long context tasks, a large amount of work has been done to enhance the long context capabilities of the model through synthetic data. Existing methods typically utilize the Self-Instruct framework to generate instruction tuning data for better long context capability improvement. However, our preliminary experiments indicate that less than 35% of generated samples are multi-hop, and more than 40% exhibit poor quality, limiting comprehensive understanding and further research. To improve the quality of synthetic data, we propose the Multi-agent Interactive Multihop Generation (MIMG) framework, incorporating a Quality Verification Agent, a Single-hop Question Generation Agent, a Multiple Question Sampling Strategy, and a Multi-hop Question Merger Agent. This framework improves the data quality, with the proportion of high-quality, multi-hop, and diverse data exceeding 85%. Furthermore, we systematically investigate strategies for document selection, question merging, and validation techniques through extensive experiments across various models. Our findings show that our synthetic high-quality long-context instruction data significantly enhances model performance, even surpassing models trained on larger amounts of human-annotated data. Our code is available at: https://github.com/WowCZ/LongMIT. Recently, large language models (LLMs) with long context windows have significantly improved tasks such as information extraction, question answering, and even complex planning scenarios (Liu et al., 2024a; Bai et al., 2024b; Hu et al., 2023; 2024; Xu et al., 2024b). Research on developing long-context LLMs has predominantly focused on extending the context window (Ding et al., 2024; Jin et al., 2024; Peng et al., 2024). Nevertheless, in practical applications, simply expanding the context window proves inadequate (Hsieh et al., 2024; Huang, 2024). There is a pressing need for training to optimize utilization of long context (Zhang et al., 2024), especially in instruction tuning (Fu et al., 2024b).

dataset, question and answer, rationale, (16 more...)

arXiv.org Artificial Intelligence

2409.01893

Country:

Asia > Singapore (0.04)
Europe > Spain > Galicia > Madrid (0.04)
Asia > Indonesia > Bali (0.04)
(9 more...)

Genre: Research Report > New Finding (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Add feedback

Investigating Multi-Hop Factual Shortcuts in Knowledge Editing of Large Language Models

Ju, Tianjie, Chen, Yijin, Yuan, Xinwei, Zhang, Zhuosheng, Du, Wei, Zheng, Yubin, Liu, Gongshen

arXiv.org Artificial IntelligenceJun-2-2024

Recent work has showcased the powerful capability of large language models (LLMs) in recalling knowledge and reasoning. However, the reliability of LLMs in combining these two capabilities into reasoning through multi-hop facts has not been widely explored. This paper systematically investigates the possibilities for LLMs to utilize shortcuts based on direct connections between the initial and terminal entities of multi-hop knowledge. We first explore the existence of factual shortcuts through Knowledge Neurons, revealing that: (i) the strength of factual shortcuts is highly correlated with the frequency of co-occurrence of initial and terminal entities in the pre-training corpora; (ii) few-shot prompting leverage more shortcuts in answering multi-hop questions compared to chain-of-thought prompting. Then, we analyze the risks posed by factual shortcuts from the perspective of multi-hop knowledge editing. Analysis shows that approximately 20% of the failures are attributed to shortcuts, and the initial and terminal entities in these failure instances usually have higher co-occurrences in the pre-training corpus. Finally, we propose erasing shortcut neurons to mitigate the associated risks and find that this approach significantly reduces failures in multiple-hop knowledge editing caused by shortcuts.

factual shortcut, knowledge, shortcut, (17 more...)

arXiv.org Artificial Intelligence

2402.119

Country:

Asia > Singapore (0.04)
Asia > Japan (0.04)
Europe > Latvia (0.04)
(14 more...)

Genre: Research Report (0.82)

Industry:

Information Technology (0.47)
Leisure & Entertainment (0.47)
Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

DIVKNOWQA: Assessing the Reasoning Ability of LLMs via Open-Domain Question Answering over Knowledge Base and Text

Zhao, Wenting, Liu, Ye, Niu, Tong, Wan, Yao, Yu, Philip S., Joty, Shafiq, Zhou, Yingbo, Yavuz, Semih

arXiv.org Artificial IntelligenceOct-31-2023

Large Language Models (LLMs) have exhibited impressive generation capabilities, but they suffer from hallucinations when solely relying on their internal knowledge, especially when answering questions that require less commonly known information. Retrieval-augmented LLMs have emerged as a potential solution to ground LLMs in external knowledge. Nonetheless, recent approaches have primarily emphasized retrieval from unstructured text corpora, owing to its seamless integration into prompts. When using structured data such as knowledge graphs, most methods simplify it into natural text, neglecting the underlying structures. Moreover, a significant gap in the current landscape is the absence of a realistic benchmark for evaluating the effectiveness of grounding LLMs on heterogeneous knowledge sources (e.g., knowledge base and text). To fill this gap, we have curated a comprehensive dataset that poses two unique challenges: (1) Two-hop multi-source questions that require retrieving information from both open-domain structured and unstructured knowledge sources; retrieving information from structured knowledge sources is a critical component in correctly answering the questions. (2) The generation of symbolic queries (e.g., SPARQL for Wikidata) is a key requirement, which adds another layer of challenge. Our dataset is created using a combination of automatic generation through predefined reasoning chains and human annotation. We also introduce a novel approach that leverages multiple retrieval tools, including text passage retrieval and symbolic language-assisted retrieval. Our model outperforms previous approaches by a significant margin, demonstrating its effectiveness in addressing the above-mentioned reasoning challenges.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2310.2017

Country:

Europe > United Kingdom > England > Greater London > London > Wimbledon (0.05)
North America > Canada > Ontario > Toronto (0.04)
Africa > Rwanda > Kigali > Kigali (0.04)
(13 more...)

Genre: Research Report > Promising Solution (0.67)

Industry:

Leisure & Entertainment (1.00)
Government > Space Agency (1.00)
Government > Regional Government > North America Government > United States Government (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Understanding and Improving Zero-shot Multi-hop Reasoning in Generative Question Answering

Jiang, Zhengbao, Araki, Jun, Ding, Haibo, Neubig, Graham

arXiv.org Artificial IntelligenceOct-9-2022

Generative question answering (QA) models generate answers to questions either solely based on the parameters of the model (the closed-book setting) or additionally retrieving relevant evidence (the open-book setting). Generative QA models can answer some relatively complex questions, but the mechanism through which they do so is still poorly understood. We perform several studies aimed at better understanding the multi-hop reasoning capabilities of generative QA models. First, we decompose multi-hop questions into multiple corresponding single-hop questions, and find marked inconsistency in QA models' answers on these pairs of ostensibly identical question chains. Second, we find that models lack zero-shot multi-hop reasoning ability: when trained only on single-hop questions, models generalize poorly to multi-hop questions. Finally, we demonstrate that it is possible to improve models' zero-shot multi-hop reasoning capacity through two methods that approximate real multi-hop natural language (NL) questions by training on either concatenation of single-hop questions or logical forms (SPARQL). In sum, these results demonstrate that multi-hop reasoning does not emerge naturally in generative QA models, but can be encouraged by advances in training or modeling techniques.

large language model, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2210.04234

Country:

Europe > France (0.47)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Mexico (0.14)
(16 more...)

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Sports > Football (0.94)
Government > Regional Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Locate Then Ask: Interpretable Stepwise Reasoning for Multi-hop Question Answering

Wang, Siyuan, Wei, Zhongyu, Fan, Zhihao, Zhang, Qi, Huang, Xuanjing

arXiv.org Artificial IntelligenceAug-22-2022

Multi-hop reasoning requires aggregating multiple documents to answer a complex question. Existing methods usually decompose the multi-hop question into simpler single-hop questions to solve the problem for illustrating the explainable reasoning process. However, they ignore grounding on the supporting facts of each reasoning step, which tends to generate inaccurate decompositions. In this paper, we propose an interpretable stepwise reasoning framework to incorporate both single-hop supporting sentence identification and single-hop question generation at each intermediate step, and utilize the inference of the current hop for the next until reasoning out the final result. We employ a unified reader model for both intermediate hop reasoning and final hop inference and adopt joint optimization for more accurate and robust multi-hop reasoning. We conduct experiments on two benchmark datasets HotpotQA and 2WikiMultiHopQA. The results show that our method can effectively boost performance and also yields a better interpretable reasoning process without decomposition supervision.

reasoning, single-hop question, stepreasoner, (12 more...)

arXiv.org Artificial Intelligence

2208.10297

Country:

North America > United States > North Carolina > Craven County > Havelock (0.04)
Asia > China (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(5 more...)

Genre: Research Report (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (0.48)
Government > Military > Marines (0.48)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.87)

Add feedback

MuSiQue: Multi-hop Questions via Single-hop Question Composition

Trivedi, Harsh, Balasubramanian, Niranjan, Khot, Tushar, Sabharwal, Ashish

arXiv.org Artificial IntelligenceAug-1-2021

To build challenging multi-hop question answering datasets, we propose a bottom-up semi-automatic process of constructing multi-hop question via composition of single-hop questions. Constructing multi-hop questions as composition of single-hop questions allows us to exercise greater control over the quality of the resulting multi-hop questions. This process allows building a dataset with (i) connected reasoning where each step needs the answer from a previous step; (ii) minimal train-test leakage by eliminating even partial overlap of reasoning steps; (iii) variable number of hops and composition structures; and (iv) contrasting unanswerable questions by modifying the context. We use this process to construct a new multihop QA dataset: MuSiQue-Ans with ~25K 2-4 hop questions using seed questions from 5 existing single-hop datasets. Our experiments demonstrate that MuSique is challenging for state-of-the-art QA models (e.g., human-machine gap of $~$30 F1 pts), significantly harder than existing datasets (2x human-machine gap), and substantially less cheatable (e.g., a single-hop model is worse by 30 F1 pts). We also build an even more challenging dataset, MuSiQue-Full, consisting of answerable and unanswerable contrast question pairs, where model performance drops further by 13+ F1 pts. For data and code, see \url{https://github.com/stonybrooknlp/musique}.

artificial intelligence, dataset, neural network, (21 more...)

arXiv.org Artificial Intelligence

2108.00573

Country:

North America > United States (0.67)
Europe > United Kingdom (0.28)
Asia > Indonesia (0.14)
(5 more...)

Genre: Research Report > New Finding (0.46)

Industry: Energy > Oil & Gas > Midstream (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Social Media (0.94)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.66)
(2 more...)

Add feedback