unanswerability
Detecting (Un)answerability in Large Language Models with Linear Directions
Lavi, Maor Juliet, Milo, Tova, Geva, Mor
Large language models (LLMs) often respond confidently to questions even when they lack the necessary information, leading to hallucinated answers. In this work, we study the problem of (un)answerability detection, focusing on extractive question answering (QA) where the model should determine if a passage contains sufficient information to answer a given question. We propose a simple approach for identifying a direction in the model's activation space that captures unanswerability and uses it for classification. This direction is selected by applying activation additions during inference and measuring their impact on the model's abstention behavior. We show that projecting hidden activations onto this direction yields a reliable score for (un)answerability classification. Experiments on two open-weight LLMs and four extractive QA benchmarks show that our method effectively detects unanswerable questions and generalizes better across datasets than existing prompt-based and classifier-based approaches. Moreover, the obtained directions extend beyond extractive QA to unanswerability that stems from factors, such as lack of scientific consensus and subjectivity. Last, causal interventions show that adding or ablating the directions effectively controls the abstention behavior of the model.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > Dominican Republic (0.04)
- (11 more...)
- Leisure & Entertainment (0.46)
- Government (0.46)
Can Video LLMs Refuse to Answer? Alignment for Answerability in Video Large Language Models
Yoon, Eunseop, Yoon, Hee Suk, Hasegawa-Johnson, Mark A., Yoo, Chang D.
In the broader context of deep learning, Multimodal Large Language Models have achieved significant breakthroughs by leveraging powerful Large Language Models as a backbone to align different modalities into the language space. A prime exemplification is the development of Video Large Language Models (Video-LLMs). While numerous advancements have been proposed to enhance the video understanding capabilities of these models, they are predominantly trained on questions generated directly from video content. However, in real-world scenarios, users often pose questions that extend beyond the informational scope of the video, highlighting the need for Video-LLMs to assess the relevance of the question. We demonstrate that even the best-performing Video-LLMs fail to reject unfit questions-not necessarily due to a lack of video understanding, but because they have not been trained to identify and refuse such questions. To address this limitation, we propose alignment for answerability, a framework that equips Video-LLMs with the ability to evaluate the relevance of a question based on the input video and appropriately decline to answer when the question exceeds the scope of the video, as well as an evaluation framework with a comprehensive set of metrics designed to measure model behavior before and after alignment. Furthermore, we present a pipeline for creating a dataset specifically tailored for alignment for answerability, leveraging existing video-description paired datasets.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Singapore (0.04)
- North America > United States > Illinois (0.04)
- (2 more...)
- Research Report (0.64)
- Overview (0.46)
Toward Robust RALMs: Revealing the Impact of Imperfect Retrieval on Retrieval-Augmented Language Models
Retrieval Augmented Language Models (RALMs) have gained significant attention for their ability to generate accurate answer and improve efficiency. However, RALMs are inherently vulnerable to imperfect information due to their reliance on the imperfect retriever or knowledge source. We identify three common scenarios-unanswerable, adversarial, conflicting-where retrieved document sets can confuse RALM with plausible real-world examples. We present the first comprehensive investigation to assess how well RALMs detect and handle such problematic scenarios. Among these scenarios, to systematically examine adversarial robustness we propose a new adversarial attack method, Generative model-based ADVersarial attack (GenADV) and a novel metric Robustness under Additional Document (RAD). Our findings reveal that RALMs often fail to identify the unanswerability or contradiction of a document set, which frequently leads to hallucinations. Moreover, we show the addition of an adversary significantly degrades RALM's performance, with the model becoming even more vulnerable when the two scenarios overlap (adversarial+unanswerable). Our research identifies critical areas for assessing and enhancing the robustness of RALMs, laying the foundation for the development of more robust models.
- Asia > South Korea > Seoul > Seoul (0.05)
- North America > United States > Ohio (0.04)
- Europe > Norway > Eastern Norway > Oslo (0.04)
- (4 more...)
Robust Few-shot Transfer Learning for Knowledge Base Question Answering with Unanswerable Questions
Sawhney, Riya, Bhattacharya, Indrajit, Mausam, null
Real-world KBQA applications require models that are (1) robust -- e.g., can differentiate between answerable and unanswerable questions, and (2) low-resource -- do not require large training data. Towards this goal, we propose the novel task of few-shot transfer for KBQA with unanswerable questions. We present FUn-FuSIC that extends the state-of-the-art (SoTA) few-shot transfer model for answerable-only KBQA to handle unanswerability. It iteratively prompts an LLM to generate logical forms for the question by providing feedback using a diverse suite of syntactic, semantic and execution guided checks, and adapts self-consistency to assess confidence of the LLM to decide answerability. Experiments over newly constructed datasets show that FUn-FuSIC outperforms suitable adaptations of the SoTA model for KBQA with unanswerability, and the SoTA model for answerable-only few-shot-transfer KBQA.
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- (2 more...)
RetinaQA: A Robust Knowledge Base Question Answering Model for both Answerable and Unanswerable Questions
Faldu, Prayushi, Bhattacharya, Indrajit, Mausam, null
An essential requirement for a real-world Knowledge Base Question Answering (KBQA) system is the ability to detect answerability of questions when generating logical forms. However, state-of-the-art KBQA models assume all questions to be answerable. Recent research has found that such models, when superficially adapted to detect answerability, struggle to satisfactorily identify the different categories of unanswerable questions, and simultaneously preserve good performance for answerable questions. Towards addressing this issue, we propose RetinaQA, a new KBQA model that unifies two key ideas in a single KBQA architecture: (a) discrimination over candidate logical forms, rather than generating these, for handling schema-related unanswerability, and (b) sketch-filling-based construction of candidate logical forms for handling data-related unaswerability. Our results show that RetinaQA significantly outperforms adaptations of state-of-the-art KBQA models in handling both answerable and unanswerable questions and demonstrates robustness across all categories of unanswerability. Notably, RetinaQA also sets a new state-of-the-art for answerable KBQA, surpassing existing models.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Asia > Thailand > Bangkok > Bangkok (0.04)
Answerability in Retrieval-Augmented Open-Domain Question Answering
Abdumalikov, Rustam, Minervini, Pasquale, Kementchedjhieva, Yova
The performance of Open-Domain Question Answering (ODQA) retrieval systems can exhibit sub-optimal behavior, providing text excerpts with varying degrees of irrelevance. Unfortunately, many existing ODQA datasets lack examples specifically targeting the identification of irrelevant text excerpts. Previous attempts to address this gap have relied on a simplistic approach of pairing questions with random text excerpts. This paper aims to investigate the effectiveness of models trained using this randomized strategy, uncovering an important limitation in their ability to generalize to irrelevant text excerpts with high semantic overlap. As a result, we observed a substantial decrease in predictive accuracy, from 98% to 1%. To address this limitation, we discovered an efficient approach for training models to recognize such excerpts. By leveraging unanswerable pairs from the SQuAD 2.0 dataset, our models achieve a nearly perfect (~100%) accuracy when confronted with these challenging text excerpts.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Texas (0.04)
- (2 more...)
Gotcha! Don't trick me with unanswerable questions! Self-aligning Large Language Models for Responding to Unknown Questions
Deng, Yang, Zhao, Yong, Li, Moxin, Ng, See-Kiong, Chua, Tat-Seng
Despite the remarkable abilities of Large Language Models (LLMs) to answer questions, they often display a considerable level of overconfidence even when the question does not have a definitive answer. To avoid providing hallucinated answers to these unknown questions, existing studies typically investigate approaches to refusing to answer these questions. In this work, we propose a novel and scalable self-alignment method to utilize the LLM itself to enhance its response-ability to different types of unknown questions, being capable of not only refusing to answer but also providing explanation to the unanswerability of unknown questions. Specifically, the Self-Align method first employ a two-stage class-aware self-augmentation approach to generate a large amount of unknown question-response data. Then we conduct disparity-driven self-curation to select qualified data for fine-tuning the LLM itself for aligning the responses to unknown questions as desired. Experimental results on two datasets across four types of unknown questions validate the superiority of the Self-Align method over existing baselines in terms of three types of task formulation.
- Europe > United Kingdom > England > Greater London > London > Wimbledon (0.05)
- Asia > Singapore (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (11 more...)
- Leisure & Entertainment (0.93)
- Media > Film (0.46)
A Lightweight Method to Generate Unanswerable Questions in English
Gautam, Vagrant, Zhang, Miaoran, Klakow, Dietrich
If a question cannot be answered with the available information, robust systems for question answering (QA) should know _not_ to answer. One way to build QA models that do this is with additional training data comprised of unanswerable questions, created either by employing annotators or through automated methods for unanswerable question generation. To show that the model complexity of existing automated approaches is not justified, we examine a simpler data augmentation method for unanswerable question generation in English: performing antonym and entity swaps on answerable questions. Compared to the prior state-of-the-art, data generated with our training-free and lightweight strategy results in better models (+1.6 F1 points on SQuAD 2.0 data with BERT-large), and has higher human-judged relatedness and readability. We quantify the raw benefits of our approach compared to no augmentation across multiple encoder models, using different amounts of generated data, and also on TydiQA-MinSpan data (+9.3 F1 points with BERT-large). Our results establish swaps as a simple but strong baseline for future work.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Germany > Saxony-Anhalt (0.14)
- North America > Bermuda (0.05)
- (18 more...)
Do I have the Knowledge to Answer? Investigating Answerability of Knowledge Base Questions
Patidar, Mayur, Faldu, Prayushi, Singh, Avinash, Vig, Lovekesh, Bhattacharya, Indrajit, Mausam, null
When answering natural language questions over knowledge bases, missing facts, incomplete schema and limited scope naturally lead to many questions being unanswerable. While answerability has been explored in other QA settings, it has not been studied for QA over knowledge bases (KBQA). We create GrailQAbility, a new benchmark KBQA dataset with unanswerability, by first identifying various forms of KB incompleteness that make questions unanswerable, and then systematically adapting GrailQA (a popular KBQA dataset with only answerable questions). Experimenting with three state-of-the-art KBQA models, we find that all three models suffer a drop in performance even after suitable adaptation for unanswerable questions. In addition, these often detect unanswerability for wrong reasons and find specific forms of unanswerability particularly difficult to handle. This underscores the need for further research in making KBQA systems robust to unanswerability
- North America > United States (0.04)
- Europe > Germany (0.04)