Goto

Collaborating Authors

 ann





Supplementary Material for GPEX, A Framework For Interpreting Artificial Neural Networks Amir Akbarnejad, Gilbert Bigras, Nilanjan Ray

Neural Information Processing Systems

Fig. S1: The proposed framework as a probabilistic graphical model. In this section we derive the variational lower-bound introduced in Sec.2.3 of the main article. W e firstly introduce Lemmas 1 and 2 as they appear in our derivations. As illustrated in Fig.S1, the ANN's input In Fig.S1 the lower boxes are the inducing points and other variables that determine the GPs' posterior. S1.1 Deriving the Lower-bound With Respect to the Kernel-mappings In the right-hand-side of Eq.S6 only the following terms are dependant on the kernel-mappings The first term is the expected log-likelihood of a Gaussian distribution (i.e. the conditional log-likelihood of Therefore, we can use Lemma.2 to simplify the first term: E According to Lemma.1 we have that Therefore, the KL-term of Eq.S8 is a constant with respect to the kernel mappings All in all, the lower-bound for optimizing the kernel-mappings is equal to the right-hand-side of Eq.S9 which was introduced and discussed in Sec.2.3. of the main article. S1.2 Deriving the Lower-bound With Respect to the ANN Parameters According to Eq.4 of the main article, in our formulation the ANN's parameters appear as some variational parameters. Therefore, the likelihood of all variables (Eq.S6) does not generally depend on the ANN's parameters. This likelihood turns out to be equivalent to commonly-used losses like the cross-entropy loss or the mean-squared loss. Here we elaborate upon how this happens. This conclusion was introduced and discussed in Eq.6 of the main article. W e can draw similar conclusions when the pipeline is for other tasks like regression, or even a combination of tasks.