Memorization and Optimization in Deep Neural Networks with Minimum Over-parameterization

Oct-10-2024, 13:48:00 GMT–Neural Information Processing Systems

The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks. A line of work has studied the NTK spectrum for two-layer and deep networks with at least a layer with \Omega(N) neurons, N being the number of training samples. Furthermore, there is increasing evidence suggesting that deep networks with sub-linear layer widths are powerful memorizers and optimizers, as long as the number of parameters exceeds the number of samples. Thus, a natural open question is whether the NTK is well conditioned in such a challenging sub-linear setup. In this paper, we answer this question in the affirmative. Our key technical contribution is a lower bound on the smallest NTK eigenvalue for deep networks with the minimum possible over-parameterization: up to logarithmic factors, the number of parameters is \Omega(N) and, hence, the number of neurons is as little as \Omega(\sqrt{N}) .

deep neural network, memorization and optimization, minimum over-parameterization, (4 more...)

Neural Information Processing Systems

Oct-10-2024, 13:48:00 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Memory-Based Learning > Rote Learning (0.70)
  - Neural Networks > Deep Learning (0.65)