PseuZO: Pseudo-Zeroth-Order Algorithm for Training Deep Neural Networks

Jun-13-2026, 11:41:33 GMT–Neural Information Processing Systems

Zeroth-order Optimization (ZO) has received wide attention in machine learning, especially when computing full gradient is expensive or even impossible. Recently, ZO has emerged as an important paradigm for memory-efficient fine-tuning of large language models (LLMs), circumventing the memory overhead of backpropagation. However, existing ZO gradient estimators exhibit dimension-dependent variance scaling as $\Theta(d)$, leading to dimension-dependent convergence rates without further assumptions on the objective function, which is prohibitive for large-scale LLM parameters.

large language model, machine learning, mathbf, (17 more...)

Neural Information Processing Systems

Jun-13-2026, 11:41:33 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language > Large Language Model (0.80)
  - Machine Learning > Neural Networks (0.53)