PseuZO: Pseudo-Zeroth-Order Algorithm for Training Deep Neural Networks
–Neural Information Processing Systems
Zeroth-order Optimization (ZO) has received wide attention in machine learning, especially when computing full gradient is expensive or even impossible. Recently, ZO has emerged as an important paradigm for memory-efficient fine-tuning of large language models (LLMs), circumventing the memory overhead of backpropagation. However, existing ZO gradient estimators exhibit dimension-dependent variance scaling as $\Theta(d)$, leading to dimension-dependent convergence rates without further assumptions on the objective function, which is prohibitive for large-scale LLM parameters.
Neural Information Processing Systems
Jun-13-2026, 11:41:33 GMT
- Technology: