ACommunication-Efficient Distributed Gradient Clipping Algorithmfor Training Deep Neural Networks