Large-Scale Methods for Distributionally Robust Optimization Daniel Levy, Y air Carmon

Neural Information Processing Systems 

Nesterov acceleration to decrease the required number of gradient steps.