Convergence Analysis of Two-layer Neural Networks with ReLU Activation

Nov-21-2025, 15:56:48 GMT–Neural Information Processing Systems

In recent years, stochastic gradient descent (SGD) based techniques has become the standard tools for training neural networks. However, formal theoretical understanding of why SGD can train neural networks in practice is largely missing. In this paper, we make progress on understanding this mystery by providing a convergence analysis for SGD on a rich subset of two-layer feedforward networks with ReLU activations. This subset is characterized by a special structure called identity mapping. We prove that, if input follows from Gaussian distribution, with standard $O(1/\sqrt{d})$ initialization of the weights, SGD converges to the global minimum in polynomial number of steps.

convergence analysis, name change, two-layer neural network, (7 more...)

Neural Information Processing Systems

Nov-21-2025, 15:56:48 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.85)
  - Statistical Learning > Gradient Descent (0.59)