Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration

Open in new window