AITopics | zeroth-order stochastic variance reduction

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

Neural Information Processing SystemsNov-20-2025, 22:53:47 GMT

As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order $O(1/b)$, where $b$ is the mini-batch size. To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations). Our extensive experimental results show that our approaches outperform other state-of-the-art ZO algorithms, and strike a balance between the convergence rate and the function query complexity.

name change, nonconvex optimization, zeroth-order stochastic variance reduction, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.97)

Add feedback

Reviews: Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

Neural Information Processing SystemsOct-8-2024, 02:07:24 GMT

In this paper, the authors propose a novel variance reduced zeroth-order method for nonconvex optimization, prove theoretical results for three different gradient estimates and demonstrate the performance of the method on two machine learning tasks. The theoretical results highlight the differences and trade-offs between the gradient estimates, and the numerical results show that these trade-offs (estimate accuracy, convergence rate, iterations and function queries) are actually realized in practice. Overall, the paper is well structured and thought out (both the theoretical and empirical portions) and the results are interesting in my opinion (for both the ML and Optimization communities), and as such I recommend this paper for publication at NIPS. - The paper is very well written and motivated, and is very easy to read. The authors should clearly state the differences both algorithmic and theoretical. Is it fair to say that this is due to the errors in the gradient estimates.

gradient approximation, nonconvex optimization, zeroth-order stochastic variance reduction, (9 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.74)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.58)

Add feedback

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

Liu, Sijia, Kailkhura, Bhavya, Chen, Pin-Yu, Ting, Paishun, Chang, Shiyu, Amini, Lisa

Neural Information Processing SystemsFeb-14-2020, 13:13:09 GMT

As application demands for zeroth-order (gradient-free) optimization accelerate, the need for variance reduced and faster converging approaches is also intensifying. This paper addresses these challenges by presenting: a) a comprehensive theoretical analysis of variance reduced zeroth-order (ZO) optimization, b) a novel variance reduced ZO algorithm, called ZO-SVRG, and c) an experimental evaluation of our approach in the context of two compelling applications, black-box chemical material classification and generation of adversarial examples from black-box deep neural network models. Our theoretical analysis uncovers an essential difficulty in the analysis of ZO-SVRG: the unbiased assumption on gradient estimates no longer holds. We prove that compared to its first-order counterpart, ZO-SVRG with a two-point random gradient estimator could suffer an additional error of order $O(1/b)$, where $b$ is the mini-batch size. To mitigate this error, we propose two accelerated versions of ZO-SVRG utilizing variance reduced gradient estimators, which achieve the best rate known for ZO stochastic optimization (in terms of iterations).

artificial intelligence, machine learning, zeroth-order stochastic variance reduction, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Collaborating Authors

zeroth-order stochastic variance reduction

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

Reviews: Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization

Zeroth-Order Stochastic Variance Reduction for Nonconvex Optimization