One Forward is Enough for Neural Network Training via Likelihood Ratio Method
Jiang, Jinyang, Zhang, Zeliang, Xu, Chenliang, Yu, Zhaofei, Peng, Yijie
–arXiv.org Artificial Intelligence
While backpropagation (BP) is the mainstream approach for gradient computation in neural network training, its heavy reliance on the chain rule of differentiation constrains the designing flexibility of network architecture and training pipelines. We avoid the recursive computation in BP and develop a unified likelihood ratio (ULR) method for gradient estimation with just one forward propagation. Not only can ULR be extended to train a wide variety of neural network architectures, but the computation flow in BP can also be rearranged by ULR for better device adaptation. Moreover, we propose several variance reduction techniques to further accelerate the training process. Our experiments offer numerical results across diverse aspects, including various neural network training scenarios, computation flow rearrangement, and fine-tuning of pre-trained models. All findings demonstrate that ULR effectively enhances the flexibility of neural network training by permitting localized module training without compromising the global objective and significantly boosts the network robustness. Since backpropagation (BP) (Rumelhart et al., 1986) has greatly facilitated the success of artificial intelligence (AI) in various real-world scenarios (Song et al., 2021a;b; Sung et al., 2021), researchers are motivated to connect this gradient computation method in neural network training with human learning behavior (Scellier & Bengio, 2017; Lillicrap et al., 2020). However, there is no evidence that the learning mechanism in biological neurons relies on BP (Hinton, 2022). Pursuing alternatives to BP holds promise for not only advancing our understanding of learning mechanisms but also developing more robust and interpretable AI systems. Moreover, the significant computational cost associated with BP (Gomez et al., 2017; Zhu et al., 2022) also calls for innovations that simplify and expedite the training process without heavy consumption. There have been continuous efforts to substitute BP in neural network training.
arXiv.org Artificial Intelligence
Oct-13-2023