Training Feature Attribution for Vision Models

Oct-13-2025–arXiv.org Artificial Intelligence

Deep neural networks are often considered opaque systems, prompting the need for explainability methods to improve trust and accountability. Existing approaches typically attribute test-time predictions either to input features (e.g., pixels in an image) or to influential training examples. We argue that both perspectives should be studied jointly. This work explores training feature attribution, which links test predictions to specific regions of specific training images and thereby provides new insights into the inner workings of deep models. Our experiments on vision datasets show that training feature attribution yields fine-grained, test-specific explanations: it identifies harmful examples that drive misclassifica-tions and reveals spurious correlations, such as patch-based shortcuts, that conventional attribution methods fail to expose. Deep neural networks have achieved state-of-the-art performance across a wide range of domains, including image recognition, natural language processing, and multimodal reasoning (He et al., 2016; Devlin et al., 2019; Radford et al., 2021). However, this impressive performance comes at the cost of transparency: modern deep models operate as complex, highly-parameterized black boxes, where the reasoning behind individual predictions is often opaque (Lipton, 2018). This opacity can undermine user trust, hinder debugging, and conceal harmful biases or spurious correlations (Ar-jovsky et al., 2019; DeGrave et al., 2021).

artificial intelligence, machine learning, training image, (17 more...)

arXiv.org Artificial Intelligence

Oct-13-2025

arXiv.org PDF

Add feedback

Country:
- Europe (0.28)

Genre:
- Research Report (0.64)

Industry:
- Transportation (0.34)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.87)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found