Goto

Collaborating Authors

 counterfactual intervention


Measuring the Reliability of Causal Probing Methods: Tradeoffs, Limitations, and the Plight of Nullifying Interventions

arXiv.org Artificial Intelligence

Causal probing is an approach to interpreting foundation models, such as large language models, by training probes to recognize latent properties of interest from embeddings, intervening on probes to modify this representation, and analyzing the resulting changes in the model's behavior. While some recent works have cast doubt on the theoretical basis of several leading causal probing intervention methods, it has been unclear how to systematically and empirically evaluate their effectiveness in practice. To address this problem, we propose a general empirical analysis framework to evaluate the reliability of causal probing interventions, formally defining and quantifying two key causal probing desiderata: completeness (fully transforming the representation of the target property) and selectivity (minimally impacting other properties). Our formalism allows us to make the first direct comparisons between different families of causal probing methods (e.g., linear vs. nonlinear or counterfactual vs. nullifying interventions). We conduct extensive experiments across several leading methods, finding that (1) there is an inherent tradeoff between these criteria, and no method is able to consistently satisfy both at once; and (2) across the board, nullifying interventions are always far less complete than counterfactual interventions, indicating that nullifying methods may not be an effective approach to causal probing.


On Counterfactual Interventions in Vector Autoregressive Models

arXiv.org Artificial Intelligence

Counterfactual reasoning allows us to explore hypothetical scenarios in order to explain the impacts of our decisions. However, addressing such inquires is impossible without establishing the appropriate mathematical framework. In this work, we introduce the problem of counterfactual reasoning in the context of vector autoregressive (VAR) processes. We also formulate the inference of a causal model as a joint regression task where for inference we use both data with and without interventions. After learning the model, we exploit linearity of the VAR model to make exact predictions about the effects of counterfactual interventions. Furthermore, we quantify the total causal effects of past counterfactual interventions. The source code for this project is freely available at https://github.com/KurtButler/counterfactual_interventions.


Towards Faithful Model Explanation in NLP: A Survey

arXiv.org Artificial Intelligence

End-to-end neural Natural Language Processing (NLP) models are notoriously difficult to understand. This has given rise to numerous efforts towards model explainability in recent years. One desideratum of model explanation is faithfulness, i.e. an explanation should accurately represent the reasoning process behind the model's prediction. In this survey, we review over 110 model explanation methods in NLP through the lens of faithfulness. We first discuss the definition and evaluation of faithfulness, as well as its significance for explainability. We then introduce recent advances in faithful explanation, grouping existing approaches into five categories: similarity-based methods, analysis of model-internal structures, backpropagation-based methods, counterfactual intervention, and self-explanatory models. For each category, we synthesize its representative studies, strengths, and weaknesses. Finally, we summarize their common virtues and remaining challenges, and reflect on future work directions towards faithful explainability in NLP.


Estimation of Counterfactual Interventions under Uncertainties

arXiv.org Artificial Intelligence

Counterfactual analysis is intuitively performed by humans on a daily basis eg. "What should I have done differently to get the loan approved?". Such counterfactual questions also steer the formulation of scientific hypotheses. More formally it provides insights about potential improvements of a system by inferring the effects of hypothetical interventions into a past observation of the system's behaviour which plays a prominent role in a variety of industrial applications. Due to the hypothetical nature of such analysis, counterfactual distributions are inherently ambiguous. This ambiguity is particularly challenging in continuous settings in which a continuum of explanations exist for the same observation. In this paper, we address this problem by following a hierarchical Bayesian approach which explicitly models such uncertainty. In particular, we derive counterfactual distributions for a Bayesian Warped Gaussian Process thereby allowing for non-Gaussian distributions and non-additive noise. We illustrate the properties our approach on a synthetic and on a semi-synthetic example and show its performance when used within an algorithmic recourse downstream task.


Synthesizing explainable counterfactual policies for algorithmic recourse with program synthesis

arXiv.org Artificial Intelligence

Being able to provide counterfactual interventions - sequences of actions we would have had to take for a desirable outcome to happen - is essential to explain how to change an unfavourable decision by a black-box machine learning model (e.g., being denied a loan request). Existing solutions have mainly focused on generating feasible interventions without providing explanations on their rationale. Moreover, they need to solve a separate optimization problem for each user. In this paper, we take a different approach and learn a program that outputs a sequence of explainable counterfactual actions given a user description and a causal graph. We leverage program synthesis techniques, reinforcement learning coupled with Monte Carlo Tree Search for efficient exploration, and rule learning to extract explanations for each recommended action. An experimental evaluation on synthetic and real-world datasets shows how our approach generates effective interventions by making orders of magnitude fewer queries to the black-box classifier with respect to existing solutions, with the additional benefit of complementing them with interpretable explanations.


Answer-Set Programs for Repair Updates and Counterfactual Interventions

arXiv.org Artificial Intelligence

We briefly describe -- mainly through very simple examples -- different kinds of answer-set programs with annotations that have been proposed for specifying: database repairs and consistent query answering; secrecy view and query evaluation with them; counterfactual interventions for causality in databases; and counterfactual-based explanations in machine learning.


Align-Deform-Subtract: An Interventional Framework for Explaining Object Differences

arXiv.org Artificial Intelligence

Given two object images, how can we explain their differences in terms of the underlying object properties? To address this question, we propose Align-Deform-Subtract (ADS) -- an interventional framework for explaining object differences. By leveraging semantic alignments in image-space as counterfactual interventions on the underlying object properties, ADS iteratively quantifies and removes differences in object properties. The result is a set of "disentangled" error measures which explain object differences in terms of the underlying properties. Experiments on real and synthetic data illustrate the efficacy of the framework.


Answer-Set Programs for Reasoning about Counterfactual Interventions and Responsibility Scores for Classification

arXiv.org Artificial Intelligence

We describe how answer-set programs can be used to declaratively specify counterfactual interventions on entities under classification, and reason about them. In particular, they can be used to define and compute responsibility scores as attribution-based explanations for outcomes from classification models. The approach allows for the inclusion of domain knowledge and supports query answering. A detailed example with a naive-Bayes classifier is presented.


Declarative Approaches to Counterfactual Explanations for Classification

arXiv.org Artificial Intelligence

We propose answer-set programs that specify and compute counterfactual interventions as a basis for causality-based explanations to the outcomes from classification models. They can be applied with black-box models, and also with models that can be specified as logic programs, such as rule-based classifiers. The main focus is on the specification and computation of maximum-responsibility counterfactual explanations, with responsibility becoming an explanation score for features of entities under classification. We also extend the programs to bring into the picture semantic or domain knowledge. We show how the approach could be extended by means of probabilistic methods, and how the underlying probability distributions could be modified through the use of constraints.


Score-Based Explanations in Data Management and Machine Learning

arXiv.org Artificial Intelligence

We describe some approaches to explanations for observed outcomes in data management and machine learning. They are based on the assignment of numerical scores to predefined and potentially relevant inputs. More specifically, we consider explanations for query answers in databases, and for results from classification models. The described approaches are mostly of a causal and counterfactual nature. We argue for the need to bring domain and semantic knowledge into score computations; and suggest some ways to do this.