1e5cff01121223de917a84a242de30a5-Paper-Conference.pdf

Neural Information Processing Systems 

InOrMo, momentum isincorporated into ASGD byorganizing the gradients in order based on their iteration indexes. We theoretically prove the convergence of OrMo with both constant and delay-adaptive learning rates for non-convexproblems.