zeroth-order stochastic variance reduction
Reviews: Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization
In this paper, the authors propose a novel variance reduced zeroth-order method for nonconvex optimization, prove theoretical results for three different gradient estimates and demonstrate the performance of the method on two machine learning tasks. The theoretical results highlight the differences and trade-offs between the gradient estimates, and the numerical results show that these trade-offs (estimate accuracy, convergence rate, iterations and function queries) are actually realized in practice. Overall, the paper is well structured and thought out (both the theoretical and empirical portions) and the results are interesting in my opinion (for both the ML and Optimization communities), and as such I recommend this paper for publication at NIPS. - The paper is very well written and motivated, and is very easy to read. The authors should clearly state the differences both algorithmic and theoretical. Is it fair to say that this is due to the errors in the gradient estimates.
Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization
Liu, Sijia, Kailkhura, Bhavya, Chen, Pin-Yu, Ting, Paishun, Chang, Shiyu, Amini, Lisa
As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order $O(1/b)$, where $b$ is the mini-batch size. To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations).