Dual Averaging is Surprisingly Effective for Deep Learning Optimization

Open in new window