Hsu, Aliyah
Enhancing CBMs Through Binary Distillation with Applications to Test-Time Intervention
Shen, Matthew, Hsu, Aliyah, Agarwal, Abhineet, Yu, Bin
Concept bottleneck models~(CBM) aim to improve model interpretability by predicting human level ``concepts" in a bottleneck within a deep learning model architecture. However, how the predicted concepts are used in predicting the target still either remains black-box or is simplified to maintain interpretability at the cost of prediction performance. We propose to use Fast Interpretable Greedy Sum-Trees~(FIGS) to obtain Binary Distillation~(BD). This new method, called FIGS-BD, distills a binary-augmented concept-to-target portion of the CBM into an interpretable tree-based model, while mimicking the competitive prediction performance of the CBM teacher. FIGS-BD can be used in downstream tasks to explain and decompose CBM predictions into interpretable binary-concept-interaction attributions and guide adaptive test-time intervention. Across $4$ datasets, we demonstrate that adaptive test-time intervention identifies key concepts that significantly improve performance for realistic human-in-the-loop settings that allow for limited concept interventions.
A generative framework to bridge data-driven models and scientific theories in language neuroscience
Antonello, Richard, Singh, Chandan, Jain, Shailee, Hsu, Aliyah, Gao, Jianfeng, Yu, Bin, Huth, Alexander
However, these models are not scientific theories that describe the world in natural language. Instead, they are implemented in the form of vast neural networks with millions or billions of largely inscrutable parameters. One emblematic field is language neuroscience, where large language models (LLMs) are highly effective at predicting human brain responses to natural language, but are virtually impossible to interpret or analyze by hand [4-10]. To overcome this challenge, we introduce the generative explanation-mediated validation (GEM-V) framework. GEM-V translates deep learning models of language selectivity in the brain into concise verbal explanations, and then designs follow-up experiments to verify that these explanations are causally related to brain activity.