Attribution for Enhanced Explanation with Transferable Adversarial eXploration

Zhu, Zhiyu, Zhang, Jiayu, Jin, Zhibo, Chen, Huaming, Zhou, Jianlong, Chen, Fang

Dec-27-2024–arXiv.org Artificial Intelligence

--The interpretability of deep neural networks is crucial for understanding model decisions in various applications, including computer vision. AttEXplore++, an advanced framework built upon AttEXplore, enhances attribution by incorporating transferable adversarial attack methods such as MIG and GRA, significantly improving the accuracy and robustness of model explanations. We conduct extensive experiments on five models, including CNNs (Inception-v3, ResNet-50, VGG16) and vision transformers (MaxViT -T, ViT -B/16), using the ImageNet dataset. Our method achieves an average performance improvement of 7.57% over AttEXplore and 32.62% compared to other state-of-the-art interpretability algorithms. Using insertion and deletion scores as evaluation metrics, we show that adversarial transferability plays a vital role in enhancing attribution results. Furthermore, we explore the impact of randomness, perturbation rate, noise amplitude, and diversity probability on attribution performance, demonstrating that AttEXplore++ provides more stable and reliable explanations across various models. We release our code at: https://anonymous.4open.science/r/A ITH the widespread application of Deep Neural Networks (DNNs) in critical fields such as medical diagnostics, autonomous driving, and financial forecasting, the interpretability of their decision-making processes has become an essential research direction [1], [2], [3]. Although DNN models demonstrate excellent performance across various complex tasks, their black-box nature limits our understanding of their internal workings [4], [5], [6]. This lack of transparency not only hinders users' trust in model decisions but also complicates the evaluation and correction of models in real-world applications [7], particularly in domains with high security and fairness requirements [8]. The goal of interpretability methods is to enhance the transparency of DNNs by revealing how the models derive decisions from input features [9].

adversarial example, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Dec-27-2024

arXiv.org PDF

Add feedback

Genre:
- Research Report > New Finding (0.46)

Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Information Technology > Security & Privacy (0.71)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)