Goto

Collaborating Authors

 asymptotic freeness


Free Probability, Newton lilypads and Jacobians of neural networks

arXiv.org Artificial Intelligence

Gradient descent during the learning process of a neural network can be subject to many instabilities. The spectral density of the Jacobian is a key component for analyzing robustness. Following the works of Pennington et al., such Jacobians are modeled using free multiplicative convolutions from Free Probability Theory. We present a reliable and very fast method for computing the associated spectral densities. This method has a controlled and proven convergence. Our technique is based on an adaptative Newton-Raphson scheme, by finding and chaining basins of attraction: the Newton algorithm finds contiguous lilypad-like basins and steps from one to the next, heading towards the objective. We demonstrate the applicability of our method by using it to assess how the learning process is affected by network depth, layer widths and initialization choices: empirically, final test losses are very correlated to our Free Probability metrics.


Asymptotic Freeness of Layerwise Jacobians Caused by Invariance of Multilayer Perceptron: The Haar Orthogonal Case

arXiv.org Machine Learning

Free Probability Theory (FPT) provides rich knowledge for handling mathematical difficulties caused by random matrices that appear in researches of deep neural networks (DNNs), such as the dynamical isometry, Fisher information matrix, and training dynamics. FPT suits these researches because the DNN's parameter-Jacobian and input-Jacobian are polynomials of layerwise Jacobians. However, the critical assumption, that is, the layerwise Jacobian's asymptotic freeness, has not been proven completely so far. The asymptotic freeness assumption has foundamental roles in these researches to propagate spectral distributions through the layers. In the present work, we prove the asymptotic freeness of layerwise Jacobian of multilayer perceptrons with Haar distributed orthogonal matrices, which are essential for achieving dynamical isometry.


The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry

arXiv.org Machine Learning

The Fisher information matrix (FIM) is fundamental for understanding the trainability of deep neural networks (DNN) since it describes the local metric of the parameter space. We investigate the spectral distribution of the FIM given a single input by focusing on fully-connected networks achieving dynamical isometry. Then, while dynamical isometry is known to keep specific backpropagated signals independent of the depth, we find that the parameter space's local metric depends on the depth. In particular, we obtain an exact expression of the spectrum of the FIM given a single input and reveal that it concentrates around the depth point. Here, considering random initialization and the wide limit, we construct an algebraic methodology to examine the spectrum based on free probability theory, which is the algebraic wrapper of random matrix theory. As a byproduct, we provide the solvable spectral distribution in the two-hidden-layer case. Lastly, we empirically confirm that the spectrum of FIM with small batch-size has the same property as the single-input version. An experimental result shows that FIM's dependence on the depth determines the appropriate size of the learning rate for convergence at the initial phase of the online training of DNNs.


Understanding the dynamics of message passing algorithms: a free probability heuristics

arXiv.org Machine Learning

A major task is to compute statistics of unobserved random variables using distributions of these variables conditioned on observed data. An exact computation of the corresponding expectations in the multivariate case is usually not possible except for simple cases. Hence, one has to resort to methods which approximate the necessary high-dimensional sums or integrals and which are often based on ideas of statistical physics [1]. A class of such approximation algorithms is often termed message passing. Prominent examples are belief propagation [2] which was developed for inference in probabilistic Bayesian networks with sparse couplings and expectation propagation (EP) which is also applicable for networks with dense coupling matrices [3]. Both types of algorithms make assumptions on weak dependencies between random variables which motivate the approximation of certain expectations by Gaussian random variables invoking central limit theorem arguments [4]. Using ideas of the statistical physics of disordered systems, such arguments can be justified for the fixed points of such algorithms for large network models where couplings are drawn from random, rotation invariant matrix distributions. This extra assumption of randomness allows for further simplifications of message passing approaches [5, 6], leading e.g. to the approximate message passing AMP or VAMP algorithms, see [7, 8, 9].


Almost Surely Asymptotic Freeness for Jacobian Spectrum of Deep Network

arXiv.org Machine Learning

Free probability theory helps us to understand Jacobian spe ctrum of deep neural networks. We rigorously show almost surely asymptot ic freeness of layer-wise Jacobians.