two-layer linear network
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
The Impact of Anisotropic Covariance Structure on the Training Dynamics and Generalization Error of Linear Networks
Watanabe, Taishi, Karakida, Ryo, Teramae, Jun-nosuke
The success of deep neural networks largely depends on the st atistical structure of the training data. While learning dynamics and generalization on iso tropic data are well-established, the impact of pronounced anisotropy on these crucial aspect s is not yet fully understood. We examine the impact of data anisotropy, represented by a sp iked covariance structure, a canonical yet tractable model, on the learning dynamics and generalization error of a two-layer linear network in a linear regression setting. Our ana lysis reveals that the learning dynamics proceed in two distinct phases, governed initiall y by the input-output correlation and subsequently by other principal directions of the data s tructure. Furthermore, we derive an analytical expression for the generalization error, quantifying how the alignment of the spike structure of the data with the learning task improv es performance. Our findings offer deep theoretical insights into how data anisotropy sha pes the learning trajectory and final performance, providing a foundation for understandin g complex interactions in more advanced network architectures.
- North America > United States (0.14)
- Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
On the spectral bias of two-layer linear networks
This paper studies the behaviour of two-layer fully connected networks with linear activations trained with gradient flow on the square loss. We show how the optimization process carries an implicit bias on the parameters that depends on the scale of its initialization. The main result of the paper is a variational characterization of the loss minimizers retrieved by the gradient flow for a specific initialization shape. This characterization reveals that, in the small scale initialization regime, the linear neural network's hidden layer is biased toward having a low-rank structure. To complement our results, we showcase a hidden mirror flow that tracks the dynamics of the singular values of the weights matrices and describe their time evolution. We support our findings with numerical experiments illustrating the phenomena.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
On the spectral bias of two-layer linear networks
This paper studies the behaviour of two-layer fully connected networks with linear activations trained with gradient flow on the square loss. We show how the optimization process carries an implicit bias on the parameters that depends on the scale of its initialization. The main result of the paper is a variational characterization of the loss minimizers retrieved by the gradient flow for a specific initialization shape. This characterization reveals that, in the small scale initialization regime, the linear neural network's hidden layer is biased toward having a low-rank structure. To complement our results, we showcase a hidden mirror flow that tracks the dynamics of the singular values of the weights matrices and describe their time evolution.