6d0bf1265ea9635fb4f9d56f16d7efb2-Paper-Conference.pdf

Neural Information Processing Systems 

Recent works have shown that line search methods can speed up Stochastic Gradient Descent (SGD) and Adam in modern over-parameterized settings. However, existing line searches may take steps that are smaller than necessary since they require a monotone decrease of the (mini-)batch objective function.