entropy and mutual information
Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation
Estimators of information theoretic measures such as entropy and mutual information from samples are a basic workhorse for many downstream applications in modern data science. State of the art approaches have been either geometric (nearest neighbor (NN) based) or kernel based (with bandwidth chosen to be data independent and vanishing sub linearly in the sample size). In this paper we combine both these approaches to design new estimators of entropy and mutual information that strongly outperform all state of the art methods. Our estimator uses bandwidth choice of fixed $k$-NN distances; such a choice is both data dependent and linearly vanishing in the sample size and necessitates a bias cancellation term that is universal and independent of the underlying distribution. As a byproduct, we obtain a unified way of obtaining both kernel and NN estimators. The corresponding theoretical contribution relating the geometry of NN distances to asymptotic order statistics is of independent mathematical interest.
Entropy and mutual information in models of deep neural networks
We examine a class of stochastic deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method.
- North America > United States > Illinois (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)
Breaking the Bandwidth Barrier: Geometrical Adaptive Entropy Estimation
Estimators of information theoretic measures such as entropy and mutual information from samples are a basic workhorse for many downstream applications in modern data science. State of the art approaches have been either geometric (nearest neighbor (NN) based) or kernel based (with bandwidth chosen to be data independent and vanishing sub linearly in the sample size). In this paper we combine both these approaches to design new estimators of entropy and mutual information that strongly outperform all state of the art methods. Our estimator uses bandwidth choice of fixed $k$-NN distances; such a choice is both data dependent and linearly vanishing in the sample size and necessitates a bias cancellation term that is universal and independent of the underlying distribution. As a byproduct, we obtain a unified way of obtaining both kernel and NN estimators. The corresponding theoretical contribution relating the geometry of NN distances to asymptotic order statistics is of independent mathematical interest.
- North America > United States > Illinois > Champaign County > Urbana (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Entropy and mutual information in models of deep neural networks
We examine a class of stochastic deep learning models with a tractable method to compute information-theoretic quantities. Our contributions are three-fold: (i) We show how entropies and mutual informations can be derived from heuristic statistical physics methods, under the assumption that weight matrices are independent and orthogonally-invariant. (ii) We extend particular cases in which this result is known to be rigorously exact by providing a proof for two-layers networks with Gaussian random weights, using the recently introduced adaptive interpolation method.
- North America > United States > North Carolina > Durham County > Durham (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- (3 more...)
Information flow in multilayer perceptrons: an in-depth analysis
Analysing how information flows along the layers of a multilayer perceptron is a topic of paramount importance in the field of artificial neural networks. After framing the problem from the point of view of information theory, in this position article a specific investigation is conducted on the way information is processed, with particular reference to the requirements imposed by supervised learning. To this end, the concept of information matrix is devised and then used as formal framework for understanding the aetiology of optimisation strategies and for studying the information flow. The underlying research for this article has also produced several key outcomes: i) the definition of a parametric optimisation strategy, ii) the finding that the optimisation strategy proposed in the information bottleneck framework shares strong similarities with the one derived from the information matrix, and iii) the insight that a multilayer perceptron serves as a kind of "adaptor", meant to process the input according to the given objective.
Reviews: Entropy and mutual information in models of deep neural networks
Contributions of the paper: The authors consider a stylized statistical model for data that respects neural network architecture, i.e. a Markov structure of the type T_\ell \varphi(W_\ell*T_{\ell-1}, \xi_\ell) where T_0 X is the input, T_L y is the output label, W_\ell are random, independent weight matrices, \varphi is a nonlinearity applied elementwise on its first argument, possibly using external randomness \xi_\ell. For data generated from this specific model, they make the following contributions. They show that under this stylized model, one can obtain a simple formula for the (normalized i.e. per unit) entropy H(T_\ell)/n and mutual information I(T_\ell; X)/n between the input data and each successive layer, in the high-dimensional limit. This formula is, in general, derived using the non-rigorous replica method from statistical physics. The experimental results are multi-faceted and include a comparison with entropy/mutual information estimators, validation of the replica formula, and some applications to the recent information bottleneck proposal of Tishby et al. Summary: The paper is a solid contribution, and I would argue that it is a clear accept.