An Improved Analysis of Training Over-parameterized Deep Neural Networks
–Neural Information Processing Systems
A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for overparameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n
Neural Information Processing Systems
Jan-24-2025, 11:38:45 GMT