Towards Robust Interpretability with Self-Explaining Neural Networks

Mar-16-2026, 19:55:35 GMT–Neural Information Processing Systems

Most recent work on interpretability of complex machine learning models has focused on estimating a-posteriori explanations for previously trained models around specific predictions. Self-explaining models where interpretability plays a key role already during learning have received much less attention. We propose three desiderata for explanations in general -- explicitness, faithfulness, and stability -- and show that existing methods do not satisfy them.

artificial intelligence, machine learning, proceedings, (5 more...)

Neural Information Processing Systems

Mar-16-2026, 19:55:35 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)