Information in Infinite Ensembles of Infinitely-Wide Neural Networks

Shwartz-Ziv, Ravid, Alemi, Alexander A.

arXiv.org Machine Learning 

One promising research direction is to view deep neural networks through the lens of information theory (Tishby and Zaslavsky, 2015). Abstractly, deep connections exist between the information a learning algorithm extracts and its generalization capabilities (Bassily et al., 2017; Banerjee, 2006). Inspired by these general results, recent papers have attempted to measure information-theoretic quantities in ordinary deterministic neural networks (Shwartz-Ziv and Tishby, 2017; Achille and Soatto, 2017; Achille and Soatto, 2019). Both practical and theoretical problems arise in the deterministic case (Amjad and Geiger, 2018; Saxe et al., 2018; Kolchinsky et al., 2018). These difficulties stem from the fact that mutual information (MI) is reparameterization independent (Cover and Thomas, 2012). 1 One workaround is to make a network explicitly stochastic, either in its activations (Alemi et al., 2016) or its weights (Achille and Soatto, 2017).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found