Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?
Marcinkevičs, Ričards, Laguna, Sonia, Vandenhirtz, Moritz, Vogt, Julia E.
–arXiv.org Artificial Intelligence
Recently, interpretable machine learning has re-explored concept bottleneck models (CBM), comprising step-by-step prediction of the high-level concepts from the raw features and the target variable from the predicted concepts. A compelling advantage of this model class is the user's ability to intervene on the predicted concept values, affecting the model's downstream output. In this work, we introduce a method to perform such concept-based interventions on already-trained neural networks, which are not interpretable by design, given an annotated validation set. Furthermore, we formalise the model's intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black-box models. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks. We demonstrate that fine-tuning improves intervention effectiveness and often yields better-calibrated predictions. To showcase the practical utility of the proposed techniques, we apply them to deep chest X-ray classifiers and show that fine-tuned black boxes can be as intervenable and more performant than CBMs.
arXiv.org Artificial Intelligence
Jan-24-2024
- Country:
- North America > United States
- California (0.04)
- New York > New York County
- New York City (0.04)
- Massachusetts > Suffolk County
- Boston (0.04)
- Florida > Miami-Dade County
- Miami (0.04)
- Europe > Switzerland
- Asia
- Middle East > Israel (0.04)
- Japan > Honshū
- Kansai > Kyoto Prefecture > Kyoto (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.93)
- Industry:
- Transportation > Air (1.00)
- Health & Medicine
- Diagnostic Medicine > Imaging (1.00)
- Nuclear Medicine (0.68)
- Technology: