wideresnet
- North America > United States > California > Yolo County > Davis (0.15)
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > California > San Diego County > San Diego (0.04)
- (3 more...)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
- Asia > Russia (0.04)
- Africa > Ethiopia (0.04)
) for fully connected networks trained on MNIST vs. depth
We thank the reviewers for the detailed and insightful reviews. We answer most of the questions and will incorporate the feedbacks into the final version. Right: Log leading terms for spectral vs. our bound on WideResNet trained on CIFAR10 using different depths. In Figure 1, we address questions about empirical evaluation of our bounds. The primary challenge is that Theorem 5.1 requires the augmented indicators on the Jacobian norms to be themselves Lipschitz w.r.t. the hidden layers.
On the Disconnect Between Theory and Practice of Overparametrized Neural Networks
Wenger, Jonathan, Dangel, Felix, Kristiadi, Agustinus
The infinite-width limit of neural networks (NNs) has garnered significant attention as a theoretical framework for analyzing the behavior of large-scale, overparametrized networks. By approaching infinite width, NNs effectively converge to a linear model with features characterized by the neural tangent kernel (NTK). This establishes a connection between NNs and kernel methods, the latter of which are well understood. Based on this link, theoretical benefits and algorithmic improvements have been hypothesized and empirically demonstrated in synthetic architectures. These advantages include faster optimization, reliable uncertainty quantification and improved continual learning. However, current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This assumption raises concerns that practically relevant architectures do not exhibit behavior as predicted via the NTK. In this work, we empirically investigate whether the limiting regime either describes the behavior of large-width architectures used in practice or is informative for algorithmic improvements. Our empirical results demonstrate that this is not the case in optimization, uncertainty quantification or continual learning. This observed disconnect between theory and practice calls into question the practical relevance of the infinite-width limit.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada > Ontario (0.04)