Understanding the (Extra-)Ordinary: Validating Deep Model Decisions with Prototypical Concept-based Explanations

Dreyer, Maximilian, Achtibat, Reduan, Samek, Wojciech, Lapuschkin, Sebastian

Nov-28-2023–arXiv.org Artificial Intelligence

Ensuring both transparency and safety is critical when deploying Deep Neural Networks (DNNs) in high-risk applications, such as medicine. The field of explainable AI (XAI) has proposed various methods to comprehend the decision-making processes of opaque DNNs. However, only few XAI methods are suitable of ensuring safety in practice as they heavily rely on repeated labor-intensive and possibly biased human assessment. In this work, we present a novel post-hoc concept-based XAI framework that conveys besides instance-wise (local) also class-wise (global) decision-making strategies via prototypes. What sets our approach apart is the combination of local and global strategies, enabling a clearer understanding of the (dis-)similarities in model decisions compared to the expected (prototypical) concept use, ultimately reducing the dependence on human long-term assessment. Quantifying the deviation from prototypical behavior not only allows to associate predictions with specific model sub-strategies but also to detect outlier behavior. As such, our approach constitutes an intuitive and explainable tool for model validation. We demonstrate the effectiveness of our approach in identifying out-of-distribution samples, spurious model behavior and data quality issues across three datasets (ImageNet, CUB-200, and CIFAR-10) utilizing VGG, ResNet, and EfficientNet architectures. Code is available on https://github.com/maxdreyer/pcx.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

Nov-28-2023

arXiv.org PDF

Add feedback

Country:
- Europe (0.67)
- North America > United States (0.28)

Genre:
- Research Report (1.00)

Industry:
- Government (0.93)
- Transportation (1.00)

Technology:
- Information Technology
  - Artificial Intelligence
    - Machine Learning
      - Neural Networks > Deep Learning (0.88)
      - Performance Analysis > Accuracy (0.67)
      - Statistical Learning (1.00)
    - Natural Language (1.00)
    - Vision (1.00)
  - Data Science > Data Mining (1.00)