Gnothi Seauton: Empowering Faithful Self-Interpretability in Black-Box Models

Wang, Shaobo, Tang, Hongxuan, Wang, Mingyang, Zhang, Hongrui, Liu, Xuyang, Li, Weiya, Hu, Xuming, Zhang, Linfeng

Oct-29-2024–arXiv.org Artificial Intelligence

The debate between self-interpretable models and post-hoc explanations for blackbox models is central to Explainable AI (XAI). Self-interpretable models, such as concept-based networks, offer insights by connecting decisions to humanunderstandable concepts but often struggle with performance and scalability. Conversely, post-hoc methods like Shapley values, while theoretically robust, are computationally expensive and resource-intensive. To bridge the gap between these two lines of research, we propose a novel method that combines their strengths, providing theoretically guaranteed self-interpretability for black-box models without compromising prediction accuracy. Specifically, we introduce a parameter-efficient pipeline, AutoGnothi, which integrates a small side network into the black-box model, allowing it to generate Shapley value explanations without changing the original network parameters. This side-tuning approach significantly reduces memory, training, and inference costs, outperforming traditional parameter-efficient methods, where full fine-tuning serves as the optimal baseline. AutoGnothi enables the black-box model to predict and explain its predictions with minimal overhead. Extensive experiments show that AutoGnothi offers accurate explanations for both vision and language tasks, delivering superior computational efficiency with comparable interpretability. Explainable AI (XAI) has gained increasing significance as AI systems are widely deployed in both vision (Dosovitskiy, 2020; Radford et al., 2021; Kirillov et al., 2023) and language domains (Devlin et al., 2019; Brown, 2020; Achiam et al., 2023). Ensuring interpretability in these systems is vital for fostering trust, ensuring fairness, and adhering to legal standards, particularly for complex models such as transformers. As illustrated in Figure 1(a), the ideal paradigm for XAI involves designing inherently transparent models that deliver superior performance.

artificial intelligence, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

Oct-29-2024

arXiv.org PDF

Add feedback

Country:
- Asia > China (0.28)
- Europe > Switzerland (0.28)

Genre:
- Research Report > New Finding (0.67)

Industry:
- Transportation > Air (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Issues > Social & Ethical Issues (0.68)
  - Machine Learning
    - Neural Networks > Deep Learning (0.46)
    - Statistical Learning (0.68)
  - Natural Language (1.00)
  - Representation & Reasoning (1.00)