counterfactual statement
Tracing Facts or just Copies? A critical investigation of the Competitions of Mechanisms in Large Language Models
Campregher, Dante, Chen, Yanxu, Hoffman, Sander, Heuss, Maria
This paper presents a reproducibility study examining how Large Language Models (LLMs) manage competing factual and counterfactual information, focusing on the role of attention heads in this process. We attempt to reproduce and reconcile findings from three recent studies by Ortu et al., Yu, Merullo, and Pavlick and McDougall et al. that investigate the competition between model-learned facts and contradictory context information through Mechanistic Interpretability tools. Our study specifically examines the relationship between attention head strength and factual output ratios, evaluates competing hypotheses about attention heads' suppression mechanisms, and investigates the domain specificity of these attention patterns. Our findings suggest that attention heads promoting factual output do so via general copy suppression rather than selective counterfactual suppression, as strengthening them can also inhibit correct facts. Additionally, we show that attention head behavior is domain-dependent, with larger models exhibiting more specialized and category-sensitive patterns.
Influence of External Information on Large Language Models Mirrors Social Cognitive Patterns
Bian, Ning, Lin, Hongyu, Liu, Peilin, Lu, Yaojie, Zhang, Chunkang, He, Ben, Han, Xianpei, Sun, Le
Social cognitive theory explains how people learn and acquire knowledge through observing others. Recent years have witnessed the rapid development of large language models (LLMs), which suggests their potential significance as agents in the society. LLMs, as AI agents, can observe external information, which shapes their cognition and behaviors. However, the extent to which external information influences LLMs' cognition and behaviors remains unclear. This study investigates how external statements and opinions influence LLMs' thoughts and behaviors from a social cognitive perspective. Three experiments were conducted to explore the effects of external information on LLMs' memories, opinions, and social media behavioral decisions. Sociocognitive factors, including source authority, social identity, and social role, were analyzed to investigate their moderating effects. Results showed that external information can significantly shape LLMs' memories, opinions, and behaviors, with these changes mirroring human social cognitive patterns such as authority bias, in-group bias, emotional positivity, and emotion contagion. This underscores the challenges in developing safe and unbiased LLMs, and emphasizes the importance of understanding the susceptibility of LLMs to external influences.
Assisting clinical practice with fuzzy probabilistic decision trees
Ambags, Emma L., Capitoli, Giulia, Imperio, Vincenzo L', Provenzano, Michele, Nobile, Marco S., Liรฒ, Pietro
The need for fully human-understandable models is increasingly being recognised as a central theme in AI research. The acceptance of AI models to assist in decision making in sensitive domains will grow when these models are interpretable, and this trend towards interpretable models will be amplified by upcoming regulations. One of the killer applications of interpretable AI is medical practice, which can benefit from accurate decision support methodologies that inherently generate trust. In this work, we propose FPT, (MedFP), a novel method that combines probabilistic trees and fuzzy logic to assist clinical practice. This approach is fully interpretable as it allows clinicians to generate, control and verify the entire diagnosis procedure; one of the methodology's strength is the capability to decrease the frequency of misdiagnoses by providing an estimate of uncertainties and counterfactuals. Our approach is applied as a proof-of-concept to two real medical scenarios: classifying malignant thyroid nodules and predicting the risk of progression in chronic kidney disease patients. Our results show that probabilistic fuzzy decision trees can provide interpretable support to clinicians, furthermore, introducing fuzzy variables into the probabilistic model brings significant nuances that are lost when using the crisp thresholds set by traditional probabilistic decision trees. We show that FPT and its predictions can assist clinical practice in an intuitive manner, with the use of a user-friendly interface specifically designed for this purpose. Moreover, we discuss the interpretability of the FPT model.
I Wish I Would Have Loved This One, But I Didn't -- A Multilingual Dataset for Counterfactual Detection in Product Reviews
O'Neill, James, Rozenshtein, Polina, Kiryo, Ryuichi, Kubota, Motoko, Bollegala, Danushka
Counterfactual statements describe events that did not or cannot take place. We consider the problem of counterfactual detection (CFD) in product reviews. For this purpose, we annotate a multilingual CFD dataset from Amazon product reviews covering counterfactual statements written in English, German, and Japanese languages. The dataset is unique as it contains counterfactuals in multiple languages, covers a new application area of e-commerce reviews, and provides high quality professional annotations. We train CFD models using different text representation methods and classifiers. We find that these models are robust against the selectional biases introduced due to cue phrase-based sentence selection. Moreover, our CFD dataset is compatible with prior datasets and can be merged to learn accurate CFD models. Applying machine translation on English counterfactual examples to create multilingual data performs poorly, demonstrating the language-specificity of this problem, which has been ignored so far.
BUT-FIT at SemEval-2020 Task 5: Automatic detection of counterfactual statements with deep pre-trained language representation models
Fajcik, Martin, Jon, Josef, Docekal, Martin, Smrz, Pavel
This paper describes BUT-FIT's submission at SemEval-2020 Task 5: Modelling Causal Reasoning in Language: Detecting Counterfactuals. The challenge focused on detecting whether a given statement contains a counterfactual (Subtask 1) and extracting both antecedent and consequent parts of the counterfactual from the text (Subtask 2). We experimented with various state-of-the-art language representation models (LRMs). We found RoBERTa LRM to perform the best in both subtasks. We achieved the first place in both exact match and F1 for Subtask 2 and ranked second for Subtask 1.
IITK-RSA at SemEval-2020 Task 5: Detecting Counterfactuals
Ojha, Anirudh Anil, Garg, Rohin, Gupta, Shashank, Modi, Ashutosh
This paper describes our efforts in tackling Task 5 of SemEval-2020. The task involved detecting a class of textual expressions known as counterfactuals and separating them into their constituent elements. Counterfactual statements describe events that have not or could not have occurred and the possible implications of such events. While counterfactual reasoning is natural for humans, understanding these expressions is difficult for artificial agents due to a variety of linguistic subtleties. Our final submitted approaches were an ensemble of various fine-tuned transformer-based and CNN-based models for the first subtask and a transformer model with dependency tree information for the second subtask. We ranked 4-th and 9-th in the overall leaderboard. We also explored various other approaches that involved the use of classical methods, other neural architectures and the incorporation of different linguistic features.
SemEval-2020 Task 5: Detecting Counterfactuals by Disambiguation
Akl, Hanna Abi, Mariko, Dominique, Labidurie, Estelle
In this paper, we explore strategies to detect and evaluate counterfactual sentences. Since causal insight is an inherent characteristic of a counterfactual, is it possible to use this information in order to locate antecedent and consequent fragments in counterfactual statements? We thus propose to compare and evaluate models to correctly identify and chunk counterfactual sentences. In our experiments, we attempt to answer the following questions: First, can a learned model discern counterfactual statements reasonably well? Second, is it possible to clearly identify antecedent and consequent parts of counterfactual sentences?