Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

Marcinkevičs, Ričards, Laguna, Sonia, Vandenhirtz, Moritz, Vogt, Julia E.

Jan-24-2024–arXiv.org Artificial Intelligence

Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design, given an annotated validation set. Furthermore, we formalise the model's intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to deep chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.

dataset, intervenability, intervention, (17 more...)

arXiv.org Artificial Intelligence

Jan-24-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California (0.04)
  - New York > New York County
    - New York City (0.04)
  - Massachusetts > Suffolk County
    - Boston (0.04)
  - Florida > Miami-Dade County
    - Miami (0.04)
- Europe > Switzerland
  - Zürich > Zürich (0.14)
- Asia
  - Middle East > Israel (0.04)
  - Japan > Honshū
    - Kansai > Kyoto Prefecture > Kyoto (0.04)

Genre:
- Research Report > New Finding (0.93)

Industry:
- Transportation > Air (1.00)
- Health & Medicine
  - Diagnostic Medicine > Imaging (1.00)
  - Nuclear Medicine (0.68)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (1.00)
  - Neural Networks > Deep Learning (0.46)