Distributed Gradient Clustering: Convergence and the Effect of Initialization