Nonlinear Conjugate Gradients For Scaling Synchronous Distributed DNN Training

Open in new window