A Communication-Efficient Distributed Gradient Clipping Algorithm for Training Deep Neural Networks