Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function