Goto

Collaborating Authors

 wideresnet






191595dc11b4d6e54f01504e3aa92f96-Paper.pdf

Neural Information Processing Systems

Inthiswork, we focus on a classification problem and investigate the behavior of both noncalibrated and calibrated negativelog-likelihood (CNLL) ofadeep ensemble as a function of the ensemble size and the member network size.




) for fully connected networks trained on MNIST vs. depth

Neural Information Processing Systems

We thank the reviewers for the detailed and insightful reviews. We answer most of the questions and will incorporate the feedbacks into the final version. Right: Log leading terms for spectral vs. our bound on WideResNet trained on CIFAR10 using different depths. In Figure 1, we address questions about empirical evaluation of our bounds. The primary challenge is that Theorem 5.1 requires the augmented indicators on the Jacobian norms to be themselves Lipschitz w.r.t. the hidden layers.



On the Disconnect Between Theory and Practice of Overparametrized Neural Networks

Wenger, Jonathan, Dangel, Felix, Kristiadi, Agustinus

arXiv.org Machine Learning

The infinite-width limit of neural networks (NNs) has garnered significant attention as a theoretical framework for analyzing the behavior of large-scale, overparametrized networks. By approaching infinite width, NNs effectively converge to a linear model with features characterized by the neural tangent kernel (NTK). This establishes a connection between NNs and kernel methods, the latter of which are well understood. Based on this link, theoretical benefits and algorithmic improvements have been hypothesized and empirically demonstrated in synthetic architectures. These advantages include faster optimization, reliable uncertainty quantification and improved continual learning. However, current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This assumption raises concerns that practically relevant architectures do not exhibit behavior as predicted via the NTK. In this work, we empirically investigate whether the limiting regime either describes the behavior of large-width architectures used in practice or is informative for algorithmic improvements. Our empirical results demonstrate that this is not the case in optimization, uncertainty quantification or continual learning. This observed disconnect between theory and practice calls into question the practical relevance of the infinite-width limit.