Usable Information and Evolution of Optimal Representations During Training

Kleinman, Michael, Idnani, Daksh, Achille, Alessandro, Kao, Jonathan C.

Oct-5-2020–arXiv.org Machine Learning

We introduce a notion of usable information contained in the representation learned by a deep network, and use it to study how optimal representations for the task emerge during training, and how they adapt to different tasks. We use this to characterize the transient dynamics of deep neural networks on perceptual decision-making tasks inspired by neuroscience literature. In particular, we show that both the random initialization and the implicit regularization from Stochastic Gradient Descent play an important role in learning minimal sufficient representations for the task. If the network is not randomly initialized, we show that the training may not recover an optimal representation, increasing the chance of overfitting. An important open question for the theory of deep learning is why highly overparametrized neural networks learn solutions that generalize well even though models can in principle memorize the entire training set. Some have speculated that neural networks learn minimal but sufficient representations of the input through implicit regularization of Stochastic Gradient Descent (SGD) (Shwartz-Ziv & Tishby, 2017; Achille & Soatto, 2018), and that the minimality of the representations relates to generalizability. Followup work has disputed some of these claims (Saxe et al., 2018), leading to an ongoing debate on the optimality of representations and how they are learned during training. Here, we design a simple task to empirically study how representations are formed during training, and how implicit regularization from SGD and initializations affect the resulting representations in deep networks. We then validate these results on a variant of an MNIST classification task to assess how SGD affects the minimality of representations. Investigations into the optimality of representations have typically used information-theoretic reasoning.

artificial intelligence, information, machine learning, (18 more...)

arXiv.org Machine Learning

Oct-5-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Massachusetts > Middlesex County
    - Cambridge (0.04)
  - California > Los Angeles County
    - Los Angeles (0.14)
    - Long Beach (0.04)

Genre:
- Research Report > New Finding (0.93)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.94)
  - Neural Networks > Deep Learning (0.69)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found