Over-parameterization Improves Generalization in the XOR Detection Problem

Brutzkus, Alon, Globerson, Amir

arXiv.org Machine Learning 

Most successful deep learning models use a number of parameters that is larger than the number of parameters that are needed to get zero-training error. This is typically referred to as overparameterization. Indeed, it can be argued that over-parameterization is one of the key techniques that has led to the remarkable success of neural networks. However, there is still no theoretical account for its effectiveness. One very intriguing observation in this context is that over-parameterized networks with ReLU activations often exhibit better generalization error than smaller networks (Neyshabur et al., 2014, 2018; Novak et al., 2018). This somewhat counterintuitive observation suggests that over-parameterized networks have an inductive bias towards solutions with better generalization performance. Understanding this inductive bias is a major theoretical challenge and is a necessary step towards a full understanding of neural networks in practice. To better understand this phenomenon, it is crucial to be able to reason about optimization and generalization properties of over-parameterized networks with ReLU activations, which are trained with gradient based methods. However, in the current state of affairs, these are far from understood. In fact, there do not exist optimization or generalization guarantees for these networks even in very simple learning tasks such as the classic XOR problem.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found