An Improved Analysis of Training Over-parameterized Deep Neural Networks

Jan-24-2025, 11:38:45 GMT–Neural Information Processing Systems

A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for overparameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n

artificial intelligence, machine learning, neural network, (17 more...)

Neural Information Processing Systems

Jan-24-2025, 11:38:45 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > California (0.28)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)