Communication-efficient SGD: From Local SGD to One-Shot Averaging