MKOR: Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates

Neural Information Processing Systems 

This work proposes a Momentum-Enabled Kronecker-Factor-Based Optimizer Using Rank-1 Updates, called MKOR, that improves the training time and convergence properties of deep neural networks (DNNs). Second-order techniques, while enjoying higher convergence rates vs first-order counterparts, have cubic complexity with respect to either the model size and/or the training batch size.