Do Sparse Subnetworks Exhibit Cognitively Aligned Attention? Effects of Pruning on Saliency Map Fidelity, Sparsity, and Concept Coherence
Suwal, Sanish, Bhusal, Dipkamal, Clifford, Michael, Rastogi, Nidhi
–arXiv.org Artificial Intelligence
Prior works have shown that neural networks can be heavily pruned while preserving performance, but the impact of pruning on model interpretability remains unclear. In this work, we investigate how magnitude-based pruning followed by fine-tuning affects both low-level saliency maps and high-level concept representations. Using a ResNet-18 trained on ImageNette, we compare post-hoc explanations from Vanilla Gradients (VG) and Integrated Gradients (IG) across pruning levels, evaluating sparsity and faithfulness. We further apply CRAFT-based concept extraction to track changes in semantic coherence of learned concepts. Our results show that light-to-moderate pruning improves saliency-map focus and faithfulness while retaining distinct, semantically meaningful concepts. In contrast, aggressive pruning merges heterogeneous features, reducing saliency map sparsity and concept coherence despite maintaining accuracy. These findings suggest that while pruning can shape internal representations toward more human-aligned attention patterns, excessive pruning undermines interpretability.
arXiv.org Artificial Intelligence
Oct-7-2025
- Country:
- Europe > United Kingdom
- England > Tyne and Wear > Newcastle (0.04)
- North America > United States
- California > Santa Clara County
- Mountain View (0.04)
- New York > Monroe County
- Rochester (0.05)
- California > Santa Clara County
- Europe > United Kingdom
- Genre:
- Research Report > New Finding (1.00)
- Industry:
- Transportation (0.31)
- Technology: