ZO-AdaMM: Zeroth-Order Adaptive Momentum Method for Black-Box Optimization
Chen, Xiangyi, Liu, Sijia, Xu, Kaidi, Li, Xingguo, Lin, Xue, Hong, Mingyi, Cox, David
–Neural Information Processing Systems
The adaptive momentum method (AdaMM), which uses past gradients to update descent directions and learning rates simultaneously, has become one of the most popular first-order optimization methods for solving machine learning problems. However, AdaMM is not suited for solving black-box optimization problems, where explicit gradient forms are difficult or infeasible to obtain. In this paper, we propose a zeroth-order AdaMM (ZO-AdaMM) algorithm, that generalizes AdaMM to the gradient-free regime. We show that the convergence rate of ZO-AdaMM for both convex and nonconvex optimization is roughly a factor of $O(\sqrt{d})$ worse than that of the first-order AdaMM algorithm, where $d$ is problem size. In particular, we provide a deep understanding on why Mahalanobis distance matters in convergence of ZO-AdaMM and other AdaMM-type methods.
Neural Information Processing Systems
Mar-18-2020, 23:31:27 GMT