Global Convergence Analysis of Local SGD for Two-layer Neural Network without Overparameterization

Neural Information Processing Systems 

Local SGD, a cornerstone algorithm in federated learning, is widely used in training deep neural networks and shown to have strong empirical performance. A theoretical understanding of such performance on nonconvex loss landscapes is currently lacking. Analysis of the global convergence of SGD is challenging, as the noise depends on the model parameters. Indeed, many works narrow their focus to GD and rely on injecting noise to enable convergence to the local or global optimum.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found