Goto

Collaborating Authors

 generating counterfactual explanation


DifCluE: Generating Counterfactual Explanations with Diffusion Autoencoders and modal clustering

arXiv.org Artificial Intelligence

Generating multiple counterfactual explanations for different modes within a class presents a significant challenge, as these modes are distinct yet converge under the same classification. Diffusion probabilistic models (DPMs) have demonstrated a strong ability to capture the underlying modes of data distributions. In this paper, we harness the power of a Diffusion Autoencoder to generate multiple distinct counterfactual explanations. By clustering in the latent space, we uncover the directions corresponding to the different modes within a class, enabling the generation of diverse and meaningful counterfactuals. We introduce a novel methodology, DifCluE, which consistently identifies these modes and produces more reliable counterfactual explanations. Our experimental results demonstrate that DifCluE outperforms the current state-of-the-art in generating multiple counterfactual explanations, offering a significant advance- ment in model interpretability.


A Comparative Analysis of Counterfactual Explanation Methods for Text Classifiers

arXiv.org Artificial Intelligence

Counterfactual explanations can be used to interpret and debug text classifiers by producing minimally altered text inputs that change a classifier's output. In this work, we evaluate five methods for generating counterfactual explanations for a BERT text classifier on two datasets using three evaluation metrics. The results of our experiments suggest that established white-box substitution-based methods are effective at generating valid counterfactuals that change the classifier's output. In contrast, newer methods based on large language models (LLMs) excel at producing natural and linguistically plausible text counterfactuals but often fail to generate valid counterfactuals that alter the classifier's output. Based on these results, we recommend developing new counterfactual explanation methods that combine the strengths of established gradient-based approaches and newer LLM-based techniques to generate high-quality, valid, and plausible text counterfactual explanations.


TX-Gen: Multi-Objective Optimization for Sparse Counterfactual Explanations for Time-Series Classification

arXiv.org Artificial Intelligence

In time-series classification, understanding model decisions is crucial for their application in high-stakes domains such as healthcare and finance. Counterfactual explanations, which provide insights by presenting alternative inputs that change model predictions, offer a promising solution. However, existing methods for generating counterfactual explanations for time-series data often struggle with balancing key objectives like proximity, sparsity, and validity. In this paper, we introduce TX-Gen, a novel algorithm for generating counterfactual explanations based on the Non-dominated Sorting Genetic Algorithm II (NSGA-II). TX-Gen leverages evolutionary multi-objective optimization to find a diverse set of counterfactuals that are both sparse and valid, while maintaining minimal dissimilarity to the original time series. By incorporating a flexible reference-guided mechanism, our method improves the plausibility and interpretability of the counterfactuals without relying on predefined assumptions. Extensive experiments on benchmark datasets demonstrate that TX-Gen outperforms existing methods in generating high-quality counterfactuals, making time-series models more transparent and interpretable.


Generating Counterfactual Explanations Using Cardinality Constraints

arXiv.org Artificial Intelligence

Providing explanations about how machine learning algorithms work and/or make particular predictions is one of the main tools that can be used to improve their trusworthiness, fairness and robustness. Among the most intuitive type of explanations are counterfactuals, which are examples that differ from a given point only in the prediction target and some set of features, presenting which features need to be changed in the original example to flip the prediction for that example. However, such counterfactuals can have many different features than the original example, making their interpretation difficult. In this paper, we propose to explicitly add a cardinality constraint to counterfactual generation limiting how many features can be different from the original example, thus providing more interpretable and easily understantable counterfactuals. Explainable Artificial Intelligence (XAI) can be defined as the study and implementation of methods than provide visibility into how an AI system makes decisions, predictions and executes its actions (Rai, 2020).


T-COL: Generating Counterfactual Explanations for General User Preferences on Variable Machine Learning Systems

arXiv.org Artificial Intelligence

Machine learning (ML) based systems have been suffering a lack of interpretability. To address this problem, counterfactual explanations (CEs) have been proposed. CEs are unique as they provide workable suggestions to users, in addition to explaining why a certain outcome was predicted. However, the application of CEs has been hindered by two main challenges, namely general user preferences and variable ML systems. User preferences, in particular, tend to be general rather than specific feature values. Additionally, CEs need to be customized to suit the variability of ML models, while also maintaining robustness even when these validation models change. To overcome these challenges, we propose several possible general user preferences that have been validated by user research and map them to the properties of CEs. We also introduce a new method called \uline{T}ree-based \uline{C}onditions \uline{O}ptional \uline{L}inks (T-COL), which has two optional structures and several groups of conditions for generating CEs that can be adapted to general user preferences. Meanwhile, a group of conditions lead T-COL to generate more robust CEs that have higher validity when the ML model is replaced. We compared the properties of CEs generated by T-COL experimentally under different user preferences and demonstrated that T-COL is better suited for accommodating user preferences and variable ML systems compared to baseline methods including Large Language Models.


Explaining Predictions from Machine Learning Models: Algorithms, Users, and Pedagogy

arXiv.org Artificial Intelligence

Model explainability has become an important problem in machine learning (ML) due to the increased effect that algorithmic predictions have on humans. Explanations can help users understand not only why ML models make certain predictions, but also how these predictions can be changed. In this thesis, we examine the explainability of ML models from three vantage points: algorithms, users, and pedagogy, and contribute several novel solutions to the explainability problem.


Model-Agnostic Counterfactual Explanations for Consequential Decisions

arXiv.org Artificial Intelligence

Predictive models are being increasingly used to support consequential decision making at the individual level in contexts such as pretrial bail and loan approval. As a result, there is increasing social and legal pressure to provide explanations that help the affected individuals not only to understand why a prediction was output, but also how to act to obtain a desired outcome. To this end, several works have proposed methods to generate counterfactual explanations. However, they are often restricted to a particular subset of models (e.g., decision trees or linear models), and cannot directly handle the mixed (numerical and nominal) nature of the features describing each individual. In this paper, we propose a model-agnostic algorithm to generate counterfactual explanations that builds on the standard theory and tools from formal verification. Specifically, our algorithm solves a sequence of satisfiability problems, where a wide variety of predictive models and distances in mixed feature spaces, as well as natural notions of plausibility and diversity, are represented as logic formulas. Our experiments on real-world data demonstrate that our approach can flexibly handle widely deployed predictive models, while providing meaningfully closer counterfactuals than existing approaches.