ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization
Sugiura, Keisuke, Matsutani, Hiroki
–arXiv.org Artificial Intelligence
First-order (FO) optimization algorithms with backpropagation (BP) [1, 2, 3, 4, 5] have been predominantly used for training deep neural networks (DNNs) thanks to the wide support in popular DL frameworks. While BP provides a systematic way to compute FO gradients via chain-rule by traversing the computational graph, it needs to save intermediate activations as well as gradients (with respect to parameters), which incurs considerably higher memory requirements than inference [6] and may pose challenges for deployment on the memory-constrained platforms (e.g., Raspberry Pi Zero). Besides, advanced FO optimizers consume extra memory to store optimizer states such as momentum (running average of past gradients) and a copy of the trainable parameters. Given this situation, in the recent literature, zeroth-order (ZO) optimization has seen a resurgence of interest as a simple yet powerful alternative to FO methods [7, 8]. One notable feature of ZO methods is that it only requires two forward passes per input during training. Since ZO gradients can be obtained from DNN outputs (loss values), ZO-based approach becomes an attractive choice when FO gradients are infeasible to obtain or not available (e.g., non-differentiable loss functions). It has been applied to a wide range of practical applications including black-box adversarial attacks [9, 10, 11] (where attackers only have an access to DNN inputs and outputs), black-box defense [12, 13], neural architecture search [14, 15], sensor selection in wireless networks [16], coverage maximization in cellular networks [17, 18], and reinforcement learning from human feedback [19, 20]. Since ZO methods bypass BP, they do not need to retain computational graphs as well as intermediate activations and gradients.
arXiv.org Artificial Intelligence
Jan-8-2025
- Genre:
- Research Report (0.82)
- Industry:
- Energy > Oil & Gas (1.00)
- Information Technology (1.00)
- Technology: