Goto

Collaborating Authors

 xai-method


A Close Look at Decomposition-based XAI-Methods for Transformer Language Models

arXiv.org Artificial Intelligence

Various XAI attribution methods have been recently proposed for the transformer architecture, allowing for insights into the decision-making process of large language models by assigning importance scores to input tokens and intermediate representations. One class of methods that seems very promising in this direction includes decomposition-based approaches, i.e., XAI-methods that redistribute the model's prediction logit through the network, as this value is directly related to the prediction. In the previous literature we note though that two prominent methods of this category, namely ALTI-Logit and LRP, have not yet been analyzed in juxtaposition and hence we propose to close this gap by conducting a careful quantitative evaluation w.r.t. ground truth annotations on a subject-verb agreement task, as well as various qualitative inspections, using BERT, GPT-2 and LLaMA-3 as a testbed. Along the way we compare and extend the ALTI-Logit and LRP methods, including the recently proposed AttnLRP variant, from an algorithmic and implementation perspective. We further incorporate in our benchmark two widely-used gradient-based attribution techniques. Finally, we make our carefullly constructed benchmark dataset for evaluating attributions on language models, as well as our code, publicly available in order to foster evaluation of XAI-methods on a well-defined common ground.


Towards Benchmarking Explainable Artificial Intelligence Methods

arXiv.org Artificial Intelligence

The currently dominating artificial intelligence and machine learning technology, neural networks, builds on inductive statistical learning. Neural networks of today are information processing systems void of understanding and reasoning capabilities, consequently, they cannot explain promoted decisions in a humanly valid form. In this work, we revisit and use fundamental philosophy of science theories as an analytical lens with the goal of revealing, what can be expected, and more importantly, not expected, from methods that aim to explain decisions promoted by a neural network. By conducting a case study we investigate a selection of explainability method's performance over two mundane domains, animals and headgear. Through our study, we lay bare that the usefulness of these methods relies on human domain knowledge and our ability to understand, generalise and reason. The explainability methods can be useful when the goal is to gain further insights into a trained neural network's strengths and weaknesses. If our aim instead is to use these explainability methods to promote actionable decisions or build trust in ML-models they need to be less ambiguous than they are today. In this work, we conclude from our study, that benchmarking explainability methods, is a central quest towards trustworthy artificial intelligence and machine learning.