llmSHAP: A Principled Approach to LLM Explainability
Naudot, Filip, Sundqvist, Tobias, Kampik, Timotheus
–arXiv.org Artificial Intelligence
The rise of data-driven algorithms and, most notably, applications of deep learning has led to concerns about a lack of thorough human oversight of socially important decisions that are either delegated in their entirety to machines, or made by humans based on machine recommendations. Explainable AI (XAI) approaches attempt to mitigate these concerns by helping (typically human) users understand how and why algorithms produce specific outputs [1]. An important class of XAI methods focuses on providing explanations of black-box classifiers that attribute classification outcomes (which one may consider decisions or decision recommendations) to input characteristics (feature values) [2, 3]. Such feature attribution methods can be considered meta-reasoning functions that approximate classifier behavior with the objective of providing users a reasonably faithful intuition of behavioral fundamentals. One of the most prominent feature attribution methods is SHAP, which is based on the Shapley value in cooperative game theory that quantifies players' (feature values') contributions to coalition utility (classification outcomes) [4]. Feature attribution methods have, in general, limitations: notably, they are necessarily approximations, and as purely technical tools, they cannot fully consider crucial nuances of the socio-technical systems they are embedded in [5]; for example, the visualizations provided out-of-the-box by feature attribution software libraries may be difficult to interpret [6]. Still, Shapley value-based approaches can be considered a reasonable choice for facilitating black-box explainability, notably because (i) they are based on well-established and intuitive mathematical principles of the Shapley value and (ii) there is at least some evidence of their potential usefulness, also relative to alternative approaches [6]. However, the Shapley value cannot straight-forwardly be applied to inference from Large Language Models (LLMs), which power many of the currently emerging AI applications.
arXiv.org Artificial Intelligence
Nov-4-2025
- Country:
- North America > United States
- California > San Francisco County
- San Francisco (0.14)
- Hawaii > Honolulu County
- Honolulu (0.04)
- New York (0.04)
- California > San Francisco County
- North America > United States
- Genre:
- Research Report (0.64)
- Industry:
- Leisure & Entertainment > Games (0.48)
- Technology: