A Framework to Learn with Interpretation

Jan-19-2025, 05:02:53 GMT–Neural Information Processing Systems

To tackle interpretability in deep learning, we present a novel framework to jointly learn a predictive model and its associated interpretation model. The interpreter provides both local and global interpretability about the predictive model in terms of human-understandable high level attribute functions, with minimal loss of accuracy. This is achieved by a dedicated architecture and well chosen regularization penalties. We seek for a small-size dictionary of high level attribute functions that take as inputs the outputs of selected hidden layers and whose outputs feed a linear classifier. We impose strong conciseness on the activation of attributes with an entropy-based criterion while enforcing fidelity to both inputs and outputs of the predictive model.

interpretability, interpretation, predictive model, (1 more...)

Neural Information Processing Systems

Jan-19-2025, 05:02:53 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology
  - Modeling & Simulation (1.00)
  - Artificial Intelligence > Machine Learning
    - Neural Networks (0.83)