Goto

Collaborating Authors

 architecture




Supplementary Materials for: Max-Sliced Mutual Information A Proofs

Neural Information Processing Systems

A.1 Proof of Proposition 1 We note that 1 is restated and was proved in [25, Appendix A.1] Proof of 2: Non-negativity directly follows by non-negativity of mutual information. Proof of 5: The proof relies on the independence of functions of independent random variables. This concludes the proof. 1 A.2 Proof of Proposition 2 By translation invariance of mutual information, we may assume w.l.o.g. that the means are Next, we show that we may equivalently optimize with the added unit variance constraint. Example 3.4]), we have I (A B) null, where the last equality uses the unit variance property and Schur's determinant formula. Armed with Lemma 1, we are in place to prove Proposition 2. Since the CCA solutions Theorem 2.2], which is restated next for completeness.





Functional Equivalence and Path Connectivity of Reducible Hyperbolic Tangent Networks

Neural Information Processing Systems

Understanding the learning process of artificial neural networks requires clarifying the structure of the parameter space within which learning takes place. A neural network parameter's functional equivalence class is the set of parameters implementing the same input-output function. For many architectures, almost all parameters have a simple and well-documented functional equivalence class. However, there is also a vanishing minority of reducible parameters, with richer functional equivalence classes caused by redundancies among the network's units. In this paper, we give an algorithmic characterisation of unit redundancies and reducible functional equivalence classes for a single-hidden-layer hyperbolic tangent architecture. We show that such functional equivalence classes are piecewise-linear path-connected sets, and that for parameters with a majority of redundant units, the sets have a diameter of at most 7 linear segments.