ElasticZO: A Memory-Efficient On-Device Learning with Combined Zeroth- and First-Order Optimization

Sugiura, Keisuke, Matsutani, Hiroki

arXiv.org Artificial Intelligence 

First-order (FO) optimization algorithms with backpropagation (BP) [1, 2, 3, 4, 5] have been predominantly used for training deep neural networks (DNNs) thanks to the wide support in popular DL frameworks. While BP provides a systematic way to compute FO gradients via chain-rule by traversing the computational graph, it needs to save intermediate activations as well as gradients (with respect to parameters), which incurs considerably higher memory requirements than inference [6] and may pose challenges for deployment on the memory-constrained platforms (e.g., Raspberry Pi Zero). Besides, advanced FO optimizers consume extra memory to store optimizer states such as momentum (running average of past gradients) and a copy of the trainable parameters. Given this situation, in the recent literature, zeroth-order (ZO) optimization has seen a resurgence of interest as a simple yet powerful alternative to FO methods [7, 8]. One notable feature of ZO methods is that it only requires two forward passes per input during training. Since ZO gradients can be obtained from DNN outputs (loss values), ZO-based approach becomes an attractive choice when FO gradients are infeasible to obtain or not available (e.g., non-differentiable loss functions). It has been applied to a wide range of practical applications including black-box adversarial attacks [9, 10, 11] (where attackers only have an access to DNN inputs and outputs), black-box defense [12, 13], neural architecture search [14, 15], sensor selection in wireless networks [16], coverage maximization in cellular networks [17, 18], and reinforcement learning from human feedback [19, 20]. Since ZO methods bypass BP, they do not need to retain computational graphs as well as intermediate activations and gradients.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found