Goto

Collaborating Authors

 conflict type


Micro-Act: Mitigating Knowledge Conflict in LLM-based RAG via Actionable Self-Reasoning

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) systems commonly suffer from Knowledge Conflicts, where retrieved external knowledge contradicts the inherent, parametric knowledge of large language models (LLMs). It adversely affects performance on downstream tasks such as question answering (QA). Existing approaches often attempt to mitigate conflicts by directly comparing two knowledge sources in a side-by-side manner, but this can overwhelm LLMs with extraneous or lengthy contexts, ultimately hindering their ability to identify and mitigate inconsistencies. To address this issue, we propose Micro-Act a framework with a hierarchical action space that automatically perceives context complexity and adaptively decomposes each knowledge source into a sequence of fine-grained comparisons. These comparisons are represented as actionable steps, enabling reasoning beyond the superficial context. Through extensive experiments on five benchmark datasets, Micro-Act consistently achieves significant increase in QA accuracy over state-of-the-art baselines across all 5 datasets and 3 conflict types, especially in temporal and semantic types where all baselines fail significantly. More importantly, Micro-Act exhibits robust performance on non-conflict questions simultaneously, highlighting its practical value in real-world RAG applications.


COMMET: A System for Human-Induced Conflicts in Mobile Manipulation of Everyday Tasks

arXiv.org Artificial Intelligence

Continuous advancements in robotics and AI are driving the integration of robots from industry into everyday environments. However, dynamic and unpredictable human activities in daily lives would directly or indirectly conflict with robot actions. Besides, due to the social attributes of such human-induced conflicts, solutions are not always unique and depend highly on the user's personal preferences. To address these challenges and facilitate the development of household robots, we propose COMMET, a system for human-induced COnflicts in Mobile Manipulation of Everyday Tasks. COMMET employs a hybrid detection approach, which begins with multi-modal retrieval and escalates to fine-tuned model inference for low-confidence cases. Based on collected user preferred options and settings, GPT-4o will be used to summarize user preferences from relevant cases. In preliminary studies, our detection module shows better accuracy and latency compared with GPT models. To facilitate future research, we also design a user-friendly interface for user data collection and demonstrate an effective workflow for real-world deployments.


DRAGged into Conflicts: Detecting and Addressing Conflicting Sources in Search-Augmented LLMs

arXiv.org Artificial Intelligence

Retrieval Augmented Generation (RAG) is a commonly used approach for enhancing large language models (LLMs) with relevant and up-to-date information. However, the retrieved sources can often contain conflicting information and it remains unclear how models should address such discrepancies. In this work, we first propose a novel taxonomy of knowledge conflict types in RAG, along with the desired model behavior for each type. We then introduce CONFLICTS, a high-quality benchmark with expert annotations of conflict types in a realistic RAG setting. CONFLICTS is the first benchmark that enables tracking progress on how models address a wide range of knowledge conflicts. We conduct extensive experiments on this benchmark, showing that LLMs often struggle to appropriately resolve conflicts between sources. While prompting LLMs to explicitly reason about the potential conflict in the retrieved documents significantly improves the quality and appropriateness of their responses, substantial room for improvement in future research remains.


Taming Knowledge Conflicts in Language Models

arXiv.org Artificial Intelligence

Language Models (LMs) often encounter knowledge conflicts when parametric memory contradicts contextual knowledge. Previous works attribute this conflict to the interplay between "memory heads" and "context heads", attention heads assumed to promote either memory or context exclusively. In this study, we go beyond this fundamental assumption by uncovering a critical phenomenon we term the "superposition of contextual information and parametric memory", where highly influential attention heads could simultaneously contribute to both memory and context. Building upon this insight, we propose Just Run Twice (JUICE), a test-time attention intervention method that steers LMs toward either parametric beliefs or contextual knowledge without requiring fine-tuning. JUICE identifies a set of reliable attention heads and leverages a dual-run approach to mitigate the superposition effects. Extensive experiments across 11 datasets and 6 model architectures demonstrate that JUICE sets the new state-of-the-art performance and robust generalization, achieving significant and consistent improvement across different domains under various conflict types. Finally, we theoretically analyze knowledge conflict and the superposition of contextual information and parametric memory in attention heads, which further elucidates the effectiveness of JUICE in these settings.


Scope of Pre-trained Language Models for Detecting Conflicting Health Information

arXiv.org Artificial Intelligence

An increasing number of people now rely on online platforms to meet their health information needs. Thus identifying inconsistent or conflicting textual health information has become a safety-critical task. Health advice data poses a unique challenge where information that is accurate in the context of one diagnosis can be conflicting in the context of another. For example, people suffering from diabetes and hypertension often receive conflicting health advice on diet. This motivates the need for technologies which can provide contextualized, user-specific health advice. A crucial step towards contextualized advice is the ability to compare health advice statements and detect if and how they are conflicting. This is the task of health conflict detection (HCD). Given two pieces of health advice, the goal of HCD is to detect and categorize the type of conflict. It is a challenging task, as (i) automatically identifying and categorizing conflicts requires a deeper understanding of the semantics of the text, and (ii) the amount of available data is quite limited. In this study, we are the first to explore HCD in the context of pre-trained language models. We find that DeBERTa-v3 performs best with a mean F1 score of 0.68 across all experiments. We additionally investigate the challenges posed by different conflict types and how synthetic data improves a model's understanding of conflict-specific semantics. Finally, we highlight the difficulty in collecting real health conflicts and propose a human-in-the-loop synthetic data augmentation approach to expand existing HCD datasets. Our HCD training dataset is over 2x bigger than the existing HCD dataset and is made publicly available on Github.


Detecting Logical Relation In Contract Clauses

arXiv.org Artificial Intelligence

In an entailment relation, if p is true is a difficult task that requires an accurate understanding then h cannot be false, otherwise there is a contradiction. of natural language meaning. The ambiguity and variability NLI is a broader task than conflict identification, and thus, of linguistic expression in natural language complicates the good models to classify logical relations will naturally be recognition of these relations such as entailment and contradiction applicable to detect contract conflicts. Importantly, since contained in texts. The ability to classify these logical NLI has seen a surge in research, including new machine inferences among different text is a significant feature learning models and dataset curation (Bowman et al. 2015; of an intelligent system (Bos and Markert 2005). Detecting Williams, Nangia, and Bowman 2018), it offers substantial these logical relations can help humans to interpret a more labelled training data in much larger quantities than contract complex text, where entailment and contradiction are crucial conflict datasets (Aires, Pinheiro, and Meneguzzi 2017).


Classifying Norm Conflicts using Learned Semantic Representations

arXiv.org Artificial Intelligence

As natural language uses a diverse and often vague way to express ideas, identifying a norm conflict and its causes While most social norms are informal, they are often in contracts is a challenging task. An ever larger number of formalized by companies in contracts to regulate contracts being currently generated necessitates a fast and reliable trades of goods and services. When poorly process to identify norm conflicts. However, since such written, contracts may contain normative conflicts contracts are written in natural language, traditional revision resulting from opposing deontic meanings or contradict methods involve contract makers reading the contract and specifications. As contracts tend to be identifying conflicting points between norms. Such a method long and contain many norms, manually identifying requires huge human-effort and may not guarantee a revision such conflicts requires human-effort, which is that eliminates all conflicts. In response, we provide three time-consuming and error-prone. Automating such contributions towards automatically identifying and classifying task benefits contract makers increasing productivity potential conflicts between norms in contracts.