Goto

Collaborating Authors

 senn





770f8e448d07586afbf77bb59f698587-AuthorFeedback.pdf

Neural Information Processing Systems

Thank you for your thoughtful feedback. We will first discuss common themes and then specific reviewer comments. Even though ExpO is "simple" (in that it connects existing concepts, albeit in a novel way), we believe We will add a discussion as outlined below. " by Qin et al does not consider interpretability at all. Several methods rely on domain knowledge: "Learning credible . . .


Self-Explaining Reinforcement Learning for Mobile Network Resource Allocation

Nowosadko, Konrad, Ruggeri, Franco, Terra, Ahmad

arXiv.org Artificial Intelligence

Abstract--Reinforcement Learning (RL) methods that incorporate deep neural networks (DNN), though powerful, often lack transparency. Their black-box characteristic hinders inter-pretability and reduces trustworthiness, particularly in critical domains. T o address this challenge in RL tasks, we propose a solution based on Self-Explaining Neural Networks (SENNs) along with explanation extraction methods to enhance inter-pretability while maintaining predictive accuracy. Our approach targets low-dimensionality problems to generate robust local and global explanations of the model's behaviour . We evaluate the proposed method on the resource allocation problem in mobile networks, demonstrating that SENNs can constitute interpretable solutions with competitive performance. This work highlights the potential of SENNs to improve transparency and trust in AIdriven decision-making for low-dimensional tasks. Interest in Explainable Artificial Intelligance (XAI) has been rapidly growing, facilitated by the need for transparency. Although powerful, Deep Neural Networks (DNNs) models often operate as black boxes, making it difficult to interpret their decisions, leading to a lack of trust among stakeholders and consequently hindering their applicability.


Self Expanding Neural Networks

Mitchell, Rupert, Mundt, Martin, Kersting, Kristian

arXiv.org Artificial Intelligence

The results of training a neural network are heavily dependent on the architecture chosen; and even a modification of only the size of the network, however small, typically involves restarting the training process. In contrast to this, we begin training with a small architecture, only increase its capacity as necessary for the problem, and avoid interfering with previous optimization while doing so. We thereby introduce a natural gradient based approach which intuitively expands both the width and depth of a neural network when this is likely to substantially reduce the hypothetical converged training loss. We prove an upper bound on the "rate" at which neurons are added, and a computationally cheap lower bound on the expansion score. We illustrate the benefits of such Self-Expanding Neural Networks in both classification and regression problems, including those where the appropriate architecture size is substantially uncertain a priori.


Concept Bottleneck Model with Additional Unsupervised Concepts

Sawada, Yoshihide, Nakamura, Keigo

arXiv.org Artificial Intelligence

With the increasing demands for accountability, interpretability is becoming an essential capability for real-world AI applications. However, most methods utilize post-hoc approaches rather than training the interpretable model. In this article, we propose a novel interpretable model based on the concept bottleneck model (CBM). CBM uses concept labels to train an intermediate layer as the additional visible layer. However, because the number of concept labels restricts the dimension of this layer, it is difficult to obtain high accuracy with a small number of labels. To address this issue, we integrate supervised concepts with unsupervised ones trained with self-explaining neural networks (SENNs). By seamlessly training these two types of concepts while reducing the amount of computation, we can obtain both supervised and unsupervised concepts simultaneously, even for large-sized images. We refer to the proposed model as the concept bottleneck model with additional unsupervised concepts (CBM-AUC). We experimentally confirmed that the proposed model outperformed CBM and SENN. We also visualized the saliency map of each concept and confirmed that it was consistent with the semantic meanings.


Towards Robust Interpretability with Self-Explaining Neural Networks

Melis, David Alvarez, Jaakkola, Tommi

Neural Information Processing Systems

Most recent work on interpretability of complex machine learning models has focused on estimating a-posteriori explanations for previously trained models around specific predictions. Self-explaining models where interpretability plays a key role already during learning have received much less attention. We propose three desiderata for explanations in general -- explicitness, faithfulness, and stability -- and show that existing methods do not satisfy them. In response, we design self-explaining models in stages, progressively generalizing linear classifiers to complex yet architecturally explicit models. Faithfulness and stability are enforced via regularization specifically tailored to such models. Experimental results across various benchmark datasets show that our framework offers a promising direction for reconciling model complexity and interpretability.


Towards Robust Interpretability with Self-Explaining Neural Networks

Alvarez-Melis, David, Jaakkola, Tommi S.

arXiv.org Machine Learning

Most recent work on interpretability of complex machine learning models has focused on estimating $\textit{a posteriori}$ explanations for previously trained models around specific predictions. $\textit{Self-explaining}$ models where interpretability plays a key role already during learning have received much less attention. We propose three desiderata for explanations in general -- explicitness, faithfulness, and stability -- and show that existing methods do not satisfy them. In response, we design self-explaining models in stages, progressively generalizing linear classifiers to complex yet architecturally explicit models. Faithfulness and stability are enforced via regularization specifically tailored to such models. Experimental results across various benchmark datasets show that our framework offers a promising direction for reconciling model complexity and interpretability.