Global Convergence Analysis of Local SGD for Two-layer Neural Network without Overparameterization
–Neural Information Processing Systems
Local SGD, a cornerstone algorithm in federated learning, is widely used in training deep neural networks and shown to have strong empirical performance. A theoretical understanding of such performance on nonconvex loss landscapes is currently lacking. Analysis of the global convergence of SGD is challenging, as the noise depends on the model parameters. Indeed, many works narrow their focus to GD and rely on injecting noise to enable convergence to the local or global optimum.
Neural Information Processing Systems
Oct-8-2025, 15:52:52 GMT
- Country:
- Asia
- China > Shanghai
- Shanghai (0.04)
- Middle East > Jordan (0.04)
- China > Shanghai
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America > United States
- Massachusetts > Suffolk County
- Boston (0.04)
- Virginia (0.04)
- Wisconsin > Dane County
- Madison (0.04)
- Massachusetts > Suffolk County
- Asia
- Technology: