FaCT Faithful Concept Traces for Explaining Neural Network Decisions
–Neural Information Processing Systems
Deep networks have shown remarkable performance across a wide range of tasks, yet getting a global concept-level understanding of how they function remains a key challenge. Many post-hoc concept-based approaches have been introduced to understand their workings, yet they are not always faithful to the model. Further, they make restrictive assumptions on the concepts a model learns, such as classspecificity, small spatial extent, or alignment to human expectations. In this work, we put emphasis on the faithfulness of such concept-based explanations and propose a new model with model-inherent mechanistic concept-explanations. Our concepts are shared across classes and, from any layer, their contribution to the logit and their input-visualization can be faithfully traced. We also leverage foundation models to propose a new concept-consistency metric, C2-score, that can be used to evaluate concept-based methods. Compared to prior work, we show that our concepts are quantitatively more consistent and that users find them to be more interpretable, while retaining competitive ImageNet performance. 1
Neural Information Processing Systems
Jun-21-2026, 22:56:19 GMT
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.92)
- Research Report
- Industry:
- Education (0.46)
- Technology: