On the Convergence of Memory-Based Distributed SGD