Climbing the Ladder of Interpretability with Counterfactual Concept Bottleneck Models

Dominici, Gabriele, Barbiero, Pietro, Giannini, Francesco, Gjoreski, Martin, Marra, Giuseppe, Langheinrich, Marc

Feb-2-2024–arXiv.org Artificial Intelligence

Current deep learning models are not designed to simultaneously address three fundamental questions: predict class labels to solve a given classification task (the "What?"), explain task predictions (the "Why?"), and imagine alternative scenarios that could result in different predictions (the "What if?"). The inability to answer these questions represents a crucial gap in deploying reliable AI agents, calibrating human trust, and deepening human-machine interaction. To bridge this gap, we introduce CounterFactual Concept Bottleneck Models (CF-CBMs), a class of models designed to efficiently address the above queries all at once without the need to run post-hoc searches. Our results show that CF-CBMs produce: accurate predictions (the "What?"), simple explanations for task predictions (the "Why?"), and interpretable counterfactuals (the "What if?"). CF-CBMs can also sample or estimate the most probable counterfactual to: (i) explain the effect of concept interventions on tasks, (ii) show users how to get a desired class label, and (iii) propose concept interventions via "task-driven" interventions.

artificial intelligence, counterfactual, machine learning, (18 more...)

arXiv.org Artificial Intelligence

Feb-2-2024

arXiv.org PDF

Add feedback

Country:
- Europe (0.14)

Genre:
- Research Report > New Finding (1.00)

Industry:
- Health & Medicine (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Representation & Reasoning (1.00)