Mildly Overparametrized Neural Nets can Memorize Training Data Efficiently

Ge, Rong, Wang, Runzhe, Zhao, Haoyu

arXiv.org Machine Learning 

However, it was observed that neural networks are able to fit the training data perfectly, even when the data/labels are randomly corrupted(Zhang et al., 2017). Recently, a series of work (Du et al. (2019); Allen-Zhu et al. (2019c); Chizat and Bach (2018); Jacot et al. (2018), see more references in Section 1.2) developed a theory of neural tangent kernels (NTK) that explains the success of training neural networks through overparametrization. Several results showed that if the number of neurons at each layer is much larger than the number of training samples, networks of different architectures (multilayer/recurrent) can all fit the training data perfectly. However, if one considers the number of parameters required for the current theoretical analysis, these networks are highly overparametrized. Consider fully connected networks for example. If a two-layer network has a hidden layer with r neurons, the number of parameters is at least rd where d is the dimension of the input.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found