Beyond Concept Bottleneck Models: How to Make Black Boxes Intervenable?

May-27-2025, 10:16:30 GMT–Neural Information Processing Systems

Recently, interpretable machine learning has re-explored concept bottleneck models (CBM). An advantage of this model class is the user's ability to intervene on predicted concept values, affecting the downstream output. In this work, we introduce a method to perform such concept-based interventions on pretrained neural networks, which are not interpretable by design, only given a small validation set with concept labels. Furthermore, we formalise the notion of intervenability as a measure of the effectiveness of concept-based interventions and leverage this definition to fine-tune black boxes. Empirically, we explore the intervenability of black-box classifiers on synthetic tabular and natural image benchmarks.

concept bottleneck model, intervenability, make black box intervenable, (1 more...)

Neural Information Processing Systems

May-27-2025, 10:16:30 GMT

Conferences Web Page

Add feedback

Industry:
- Transportation > Air (0.94)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.61)