Reviews: Entropy and mutual information in models of deep neural networks
–Neural Information Processing Systems
Contributions of the paper: The authors consider a stylized statistical model for data that respects neural network architecture, i.e. a Markov structure of the type T_\ell \varphi(W_\ell*T_{\ell-1}, \xi_\ell) where T_0 X is the input, T_L y is the output label, W_\ell are random, independent weight matrices, \varphi is a nonlinearity applied elementwise on its first argument, possibly using external randomness \xi_\ell. For data generated from this specific model, they make the following contributions. They show that under this stylized model, one can obtain a simple formula for the (normalized i.e. per unit) entropy H(T_\ell)/n and mutual information I(T_\ell; X)/n between the input data and each successive layer, in the high-dimensional limit. This formula is, in general, derived using the non-rigorous replica method from statistical physics. The experimental results are multi-faceted and include a comparison with entropy/mutual information estimators, validation of the replica formula, and some applications to the recent information bottleneck proposal of Tishby et al. Summary: The paper is a solid contribution, and I would argue that it is a clear accept.
Neural Information Processing Systems
May-26-2025, 07:14:02 GMT
- Technology: