AITopics | Question Answering

Collaborating Authors

Question Answering

"Questions are asked and answered every day. Question answering (QA) technology aims to deliver the same facility online. It goes further than the more familiar search based on keywords (as in Google, Yahoo, and other search engines), in attempting to recognize what a question expresses and to respond with an actual answer. This simplifies things for users in two ways. First, questions do not often translate into a simple list of keywords. ...Second, QA takes responsibility for providing answers, rather than a searchable list of links to potentially relevant documents (web pages), highlighted by snippets of text that show how the query matched the documents."
– from Bonnie Webber & Nick Webb. Question Answering. In The Handbook of Computational Linguistics and Natural Language Processing. Alexander Clark, Chris Fox, Shalom Lappin (Eds.). Wiley, 2010.

News Overviews Instructional Materials AI-Alerts Classics

PreWoMe: Exploiting Presuppositions as Working Memory for Long Form Question Answering

Han, Wookje, Park, Jinsol, Lee, Kyungjae

arXiv.org Artificial IntelligenceNov-13-2023

Information-seeking questions in long-form question answering (LFQA) often prove misleading due to ambiguity or false presupposition in the question. While many existing approaches handle misleading questions, they are tailored to limited questions, which are insufficient in a real-world setting with unpredictable input characteristics. In this work, we propose PreWoMe, a unified approach capable of handling any type of information-seeking question. The key idea of PreWoMe involves extracting presuppositions in the question and exploiting them as working memory to generate feedback and action about the question. Our experiment shows that PreWoMe is effective not only in tackling misleading questions but also in handling normal ones, thereby demonstrating the effectiveness of leveraging presuppositions, feedback, and action for real-world QA settings.

misleading question, presupposition, prewome, (13 more...)

arXiv.org Artificial Intelligence

2310.16147

Country:

Europe > Italy (0.05)
Africa > Ethiopia (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > California > Los Angeles County > Inglewood (0.04)

Genre: Research Report (0.64)

Industry:

Leisure & Entertainment > Sports > Boxing (0.94)
Media > Film (0.94)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.81)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)

Add feedback

Navigating the Grey Area: How Expressions of Uncertainty and Overconfidence Affect Language Models

Zhou, Kaitlyn, Jurafsky, Dan, Hashimoto, Tatsunori

arXiv.org Artificial IntelligenceNov-13-2023

The increased deployment of LMs for real-world tasks involving knowledge and facts makes it important to understand model epistemology: what LMs think they know, and how their attitudes toward that knowledge are affected by language use in their inputs. Here, we study an aspect of model epistemology: how epistemic markers of certainty, uncertainty, or evidentiality like "I'm sure it's", "I think it's", or "Wikipedia says it's" affect models, and whether they contribute to model failures. We develop a typology of epistemic markers and inject 50 markers into prompts for question answering. We find that LMs are highly sensitive to epistemic markers in prompts, with accuracies varying more than 80%. Surprisingly, we find that expressions of high certainty result in a 7% decrease in accuracy as compared to low certainty expressions; similarly, factive verbs hurt performance, while evidentials benefit performance. Our analysis of a popular pretraining dataset shows that these markers of uncertainty are associated with answers on question-answering websites, while markers of certainty are associated with questions. These associations may suggest that the behavior of LMs is based on mimicking observed language use, rather than truly reflecting epistemic uncertainty.

accuracy, expression, factive, (14 more...)

arXiv.org Artificial Intelligence

2302.13439

Country:

Europe > France (0.05)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(12 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Education (0.46)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.70)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.70)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

AVeriTeC: A Dataset for Real-world Claim Verification with Evidence from the Web

Schlichtkrull, Michael, Guo, Zhijiang, Vlachos, Andreas

arXiv.org Artificial IntelligenceNov-8-2023

Existing datasets for automated fact-checking have substantial limitations, such as relying on artificial claims, lacking annotations for evidence and intermediate reasoning, or including evidence published after the claim. In this paper we introduce AVeriTeC, a new dataset of 4,568 real-world claims covering fact-checks by 50 different organizations. Each claim is annotated with question-answer pairs supported by evidence available online, as well as textual justifications explaining how the evidence combines to produce a verdict. Through a multi-round annotation process, we avoid common pitfalls including context dependence, evidence insufficiency, and temporal leakage, and reach a substantial inter-annotator agreement of $\kappa=0.619$ on verdicts. We develop a baseline as well as an evaluation scheme for verifying claims through several question-answering steps against the open web.

annotator, computational linguistic, fact-checking article, (13 more...)

arXiv.org Artificial Intelligence

2305.13117

Country:

Africa > Nigeria (0.14)
Africa > Kenya (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
(38 more...)

Genre: Research Report > Experimental Study (0.67)

Industry:

Media > News (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Energy (0.93)
(4 more...)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Add feedback

SEMQA: Semi-Extractive Multi-Source Question Answering

Schuster, Tal, Lelkes, Adam D., Sun, Haitian, Gupta, Jai, Berant, Jonathan, Cohen, William W., Metzler, Donald

arXiv.org Artificial IntelligenceNov-8-2023

Recently proposed long-form question answering (QA) systems, supported by large language models (LLMs), have shown promising capabilities. Yet, attributing and verifying their generated abstractive answers can be difficult, and automatically evaluating their accuracy remains an ongoing challenge. In this work, we introduce a new QA task for answering multi-answer questions by summarizing multiple diverse sources in a semi-extractive fashion. Specifically, Semi-extractive Multi-source QA (SEMQA) requires models to output a comprehensive answer, while mixing factual quoted spans -- copied verbatim from given input sources -- and non-factual free-text connectors that glue these spans together into a single cohesive passage. This setting bridges the gap between the outputs of well-grounded but constrained extractive QA systems and more fluent but harder to attribute fully abstractive answers. Particularly, it enables a new mode for language models that leverages their advanced language generation capabilities, while also producing fine in-line attributions by-design that are easy to verify, interpret, and evaluate. To study this task, we create the first dataset of this kind, QuoteSum, with human-written semi-extractive answers to natural and generated questions, and define text-based evaluation metrics. Experimenting with several LLMs in various settings, we find this task to be surprisingly challenging, demonstrating the importance of QuoteSum for developing and studying such consolidation capabilities.

semqa

arXiv.org Artificial Intelligence

2311.04886

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.73)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.60)

Add feedback

PDFTriage: Question Answering over Long, Structured Documents

Saad-Falcon, Jon, Barrow, Joe, Siu, Alexa, Nenkova, Ani, Yoon, David Seunghyun, Rossi, Ryan A., Dernoncourt, Franck

arXiv.org Artificial IntelligenceNov-8-2023

Large Language Models (LLMs) have issues with document question answering (QA) in situations where the document is unable to fit in the small context length of an LLM. To overcome this issue, most existing works focus on retrieving the relevant context from the document, representing them as plain text. However, documents such as PDFs, web pages, and presentations are naturally structured with different pages, tables, sections, and so on. Representing such structured documents as plain text is incongruous with the user's mental model of these documents with rich structure. When a system has to query the document for context, this incongruity is brought to the fore, and seemingly trivial questions can trip up the QA system. To bridge this fundamental gap in handling structured documents, we propose an approach called PDFTriage that enables models to retrieve the context based on either structure or content. Our experiments demonstrate the effectiveness of the proposed PDFTriage-augmented models across several classes of questions where existing retrieval-augmented LLMs fail. To facilitate further research on this fundamental problem, we release our benchmark dataset consisting of 900+ human-generated questions over 80 structured documents from 10 different categories of question types for document QA. Our code and datasets will be released soon on Github.

pdftriage, structured document

arXiv.org Artificial Intelligence

2309.08872

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.93)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.60)

Add feedback

Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning

Li, Ruosen, Du, Xinya

arXiv.org Artificial IntelligenceNov-7-2023

Neural models, including large language models (LLMs), achieve superior performance on multi-hop question-answering. To elicit reasoning capabilities from LLMs, recent works propose using the chain-of-thought (CoT) mechanism to generate both the reasoning chain and the answer, which enhances the model's capabilities in conducting multi-hop reasoning. However, several challenges still remain: such as struggling with inaccurate reasoning, hallucinations, and lack of interpretability. On the other hand, information extraction (IE) identifies entities, relations, and events grounded to the text. The extracted structured information can be easily interpreted by humans and machines (Grishman, 2019). In this work, we investigate constructing and leveraging extracted semantic structures (graphs) for multi-hop question answering, especially the reasoning process. Empirical results and human evaluations show that our framework: generates more faithful reasoning chains and substantially improves the QA performance on two benchmark datasets. Moreover, the extracted structures themselves naturally provide grounded explanations that are preferred by humans, as compared to the generated reasoning chains and saliency-based explanations.

graph, reasoning, reasoning chain, (16 more...)

arXiv.org Artificial Intelligence

2311.03734

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Germany (0.04)
North America > Dominican Republic (0.04)
(7 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.82)

Add feedback

Question Answering for Electronic Health Records: A Scoping Review of datasets and models

Bardhan, Jayetri, Roberts, Kirk, Wang, Daisy Zhe

arXiv.org Artificial IntelligenceNov-7-2023

Question Answering (QA) systems on patient-related data can assist both clinicians and patients. They can, for example, assist clinicians in decision-making and enable patients to have a better understanding of their medical history. Significant amounts of patient data are stored in Electronic Health Records (EHRs), making EHR QA an important research area. In EHR QA, the answer is obtained from the medical record of the patient. Because of the differences in data format and modality, this differs greatly from other medical QA tasks that employ medical websites or scientific papers to retrieve answers, making it critical to research EHR question answering. This study aimed to provide a methodological review of existing works on QA over EHRs. We searched for articles from January 1st, 2005 to September 30th, 2023 in four digital sources including Google Scholar, ACL Anthology, ACM Digital Library, and PubMed to collect relevant publications on EHR QA. 4111 papers were identified for our study, and after screening based on our inclusion criteria, we obtained a total of 47 papers for further study. Out of the 47 papers, 25 papers were about EHR QA datasets, and 37 papers were about EHR QA models. It was observed that QA on EHRs is relatively new and unexplored. Most of the works are fairly recent. Also, it was observed that emrQA is by far the most popular EHR QA dataset, both in terms of citations and usage in other papers. Furthermore, we identified the different models used in EHR QA along with the evaluation metrics used for these models.

dataset and model, electronic health record, scoping review

arXiv.org Artificial Intelligence

2310.08759

Genre: Research Report (0.40)

Industry: Health & Medicine > Health Care Technology > Medical Record (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.80)

Add feedback

Adapting Pre-trained Generative Models for Extractive Question Answering

Mallick, Prabir, Nayak, Tapas, Bhattacharya, Indrajit

arXiv.org Artificial IntelligenceNov-6-2023

Pre-trained Generative models such as BART, T5, etc. have gained prominence as a preferred method for text generation in various natural language processing tasks, including abstractive long-form question answering (QA) and summarization. However, the potential of generative models in extractive QA tasks, where discriminative models are commonly employed, remains largely unexplored. Discriminative models often encounter challenges associated with label sparsity, particularly when only a small portion of the context contains the answer. The challenge is more pronounced for multi-span answers. In this work, we introduce a novel approach that uses the power of pre-trained generative models to address extractive QA tasks by generating indexes corresponding to context tokens or sentences that form part of the answer. Through comprehensive evaluations on multiple extractive QA datasets, including MultiSpanQA, BioASQ, MASHQA, and WikiQA, we demonstrate the superior performance of our proposed approach compared to existing state-of-the-art models.

base 0, dataset, span, (16 more...)

arXiv.org Artificial Intelligence

2311.02961

Country: Asia > India (0.04)

Genre: Research Report > Promising Solution (0.55)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.88)

Add feedback

Robust Training for Conversational Question Answering Models with Reinforced Reformulation Generation

Kaiser, Magdalena, Roy, Rishiraj Saha, Weikum, Gerhard

arXiv.org Artificial IntelligenceNov-6-2023

Models for conversational question answering (ConvQA) over knowledge graphs (KGs) are usually trained and tested on benchmarks of gold QA pairs. This implies that training is limited to surface forms seen in the respective datasets, and evaluation is on a small set of held-out questions. Through our proposed framework REIGN, we take several steps to remedy this restricted learning setup. First, we systematically generate reformulations of training questions to increase robustness of models to surface form variations. This is a particularly challenging problem, given the incomplete nature of such questions. Second, we guide ConvQA models towards higher performance by feeding it only those reformulations that help improve their answering quality, using deep reinforcement learning. Third, we demonstrate the viability of training major model components on one benchmark and applying them zero-shot to another. Finally, for a rigorous evaluation of robustness for trained models, we use and release large numbers of diverse reformulations generated by prompting GPT for benchmark test sets (resulting in 20x increase in sizes). Our findings show that ConvQA models with robust training via reformulations, significantly outperform those with standard training from gold QA pairs only.

category, convqa model, reformulation, (14 more...)

arXiv.org Artificial Intelligence

2310.13505

Country:

North America > Mexico > Yucatán > Mérida (0.05)
Europe > Germany > Saarland (0.04)
South America > Brazil (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry:

Media > Music (0.93)
Leisure & Entertainment > Sports > Soccer (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Optimizing Retrieval-augmented Reader Models via Token Elimination

Berchansky, Moshe, Izsak, Peter, Caciularu, Avi, Dagan, Ido, Wasserblat, Moshe

arXiv.org Artificial IntelligenceNov-5-2023

Fusion-in-Decoder (FiD) is an effective retrieval-augmented language model applied across a variety of open-domain tasks, such as question answering, fact checking, etc. In FiD, supporting passages are first retrieved and then processed using a generative model (Reader), which can cause a significant bottleneck in decoding time, particularly with long outputs. In this work, we analyze the contribution and necessity of all the retrieved passages to the performance of reader models, and propose eliminating some of the retrieved information, at the token level, that might not contribute essential information to the answer generation process. We demonstrate that our method can reduce run-time by up to 62.2%, with only a 2% reduction in performance, and in some cases, even improve the performance results.

answer length, dataset, latency, (15 more...)

arXiv.org Artificial Intelligence

2310.13682

Country:

Asia > Middle East > Israel (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Nephrology (0.68)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.55)

Add feedback