mnist
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Vietnam > Hanoi > Hanoi (0.05)
- North America > United States > Tennessee > Davidson County > Nashville (0.04)
- (7 more...)
- Asia > China > Hong Kong (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)
Supplementary Material for GPEX, A Framework For Interpreting Artificial Neural Networks Amir Akbarnejad, Gilbert Bigras, Nilanjan Ray
Fig. S1: The proposed framework as a probabilistic graphical model. In this section we derive the variational lower-bound introduced in Sec.2.3 of the main article. W e firstly introduce Lemmas 1 and 2 as they appear in our derivations. As illustrated in Fig.S1, the ANN's input In Fig.S1 the lower boxes are the inducing points and other variables that determine the GPs' posterior. S1.1 Deriving the Lower-bound With Respect to the Kernel-mappings In the right-hand-side of Eq.S6 only the following terms are dependant on the kernel-mappings The first term is the expected log-likelihood of a Gaussian distribution (i.e. the conditional log-likelihood of Therefore, we can use Lemma.2 to simplify the first term: E According to Lemma.1 we have that Therefore, the KL-term of Eq.S8 is a constant with respect to the kernel mappings All in all, the lower-bound for optimizing the kernel-mappings is equal to the right-hand-side of Eq.S9 which was introduced and discussed in Sec.2.3. of the main article. S1.2 Deriving the Lower-bound With Respect to the ANN Parameters According to Eq.4 of the main article, in our formulation the ANN's parameters appear as some variational parameters. Therefore, the likelihood of all variables (Eq.S6) does not generally depend on the ANN's parameters. This likelihood turns out to be equivalent to commonly-used losses like the cross-entropy loss or the mean-squared loss. Here we elaborate upon how this happens. This conclusion was introduced and discussed in Eq.6 of the main article. W e can draw similar conclusions when the pipeline is for other tasks like regression, or even a combination of tasks.
Flow Factorized Representation Learning-Supplementary Material-Y ue Song 1,2, Andy Keller 2, Nicu Sebe 1, and Max Welling 2
Here we omit the computation of HJ PDEs for concisity. The model is trained for 90, 000 iterations. The model is also trained for 90, 000 iterations. For the disentanglement methods, we largely enrich the original MNIST dataset by adding the transformed images of the whole sequence. The generalization ability ( i.e., validation accuracy) can be thus regarded as a reasonable surrogate for the disentanglement ability.
- Europe > Netherlands > North Holland > Amsterdam (0.05)
- Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.04)
- North America > United States > Pennsylvania (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
- North America > United States > Texas > Denton County > Denton (0.14)
- North America > United States > New York > Monroe County > Rochester (0.04)
- Europe > Germany > Brandenburg > Potsdam (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)