Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

Open in new window