Feature Learning in L_2 -regularized DNNs: Attraction/Repulsion and Sparsity
–Neural Information Processing Systems
We study the loss surface of DNNs with L_{2} regularization. Weshow that the loss in terms of the parameters can be reformulatedinto a loss in terms of the layerwise activations Z_{\ell} of thetraining set. This reformulation reveals the dynamics behind featurelearning: each hidden representations Z_{\ell} are optimal w.r.t.to an attraction/repulsion problem and interpolate between the inputand output representations, keeping as little information from theinput as necessary to construct the activation of the next layer.For positively homogeneous non-linearities, the loss can be furtherreformulated in terms of the covariances of the hidden representations,which takes the form of a partially convex optimization over a convexcone.This second reformulation allows us to prove a sparsity result forhomogeneous DNNs: any local minimum of the L_{2} -regularized losscan be achieved with at most N(N 1) neurons in each hidden layer(where N is the size of the training set). We show that this boundis tight by giving an example of a local minimum that requires N {2}/4 hidden neurons. But we also observe numerically that in more traditionalsettings much less than N {2} neurons are required to reach theminima.
Neural Information Processing Systems
Oct-10-2024, 11:45:02 GMT
- Technology: