Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training

Open in new window