On Generalization Bounds for Neural Networks with Low Rank Layers

Pinto, Andrea, Rangamani, Akshay, Poggio, Tomaso

arXiv.org Machine Learning 

Deep learning has achieved remarkable success across a wide range of applications, including computer vision[2, 3], natural language processing [4, 5], decision-making in novel environments [6], and code generation [7], among others. Understanding the reasons behind the effectiveness of deep learning is a multifaceted challenge that involves questions about architectural choices, optimizer selection, and the types of inductive biases that can guarantee generalization. A long-standing question in this field is how deep learning finds solutions that generalize well. While good generalization performance by overparameterized models is not unique to deep learning--it can be explained by the implicit bias of learning algorithms towards low-norm solutions in linear models and kernel machines [8, 9]--the case of deep learning presents additional challenges. However in the case of deep learning, identifying the right implicit bias and obtaining generalization bounds that depend on this bias are still open questions. In recent years, Rademacher bounds have been developed to explain the complexity control induced by an important bias in deep network training: the minimization of weight matrix norms. This minimization occurs due to explicit or implicit regularization [10, 11, 12, 13]. For rather general network architectures, Golowich et al.[14] showed that the Rademacher complexity is linear in the product of the Frobenius norms of the various layers. Although the associated bounds are usually orders of magnitude larger than the generalization gap for dense networks, very recent results by Galanti et al. [15] demonstrate that for networks with structural sparsity in their weight matrices, such as convolutional networks, norm-based Rademacher bounds approach non-vacuity.