CD_GraB_camera_ready

A. Feder Cooper

Neural Information Processing Systems 

Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR.