Delayed Gradient Averaging: Tolerate the Communication Latency in Federated Learning