Goto

Collaborating Authors

 Backpropagation





An In-depth Study of Stochastic Backpropagation

Neural Information Processing Systems

In particular, we discuss the following: Section 8.1 derives the gradient calculation for attention layers. Section 8.4 investigates the insights on the gradient keep-ratios and gradient keep masks on Section 8.6 compares the model similarity between with and without applying SBP . In section 3.2, we provide the gradient calculation of linear layers (or PW-Conv) and general convolutional layers for the backward phase of SBP . Eq. (18) and it will be an approximated version of its original case as well. The MLP sub-block is equivalent to two PW-Conv or linear layers.


Convergence and Alignment of Gradient Descent with Random Backpropagation Weights Ganlin Song Ruitu Xu John Lafferty Department of Statistics and Data Science

Neural Information Processing Systems

Stochastic gradient descent with backpropagation is the workhorse of artificial neural networks. It has long been recognized that backpropagation fails to be a biologically plausible algorithm. Fundamentally, it is a non-local procedure-- updating one neuron's synaptic weights requires knowledge of synaptic weights or receptive fields of downstream neurons. This limits the use of artificial neural networks as a tool for understanding the biological principles of information processing in the brain. Lillicrap et al. (2016) propose a more biologically plausible "feedback alignment" algorithm that uses random and fixed backpropagation weights, and show promising simulations. In this paper we study the mathematical properties of the feedback alignment procedure by analyzing convergence and alignment for two-layer networks under squared error loss. In the overparameter-ized setting, we prove that the error converges to zero exponentially fast, and also that regularization is necessary in order for the parameters to become aligned with the random backpropagation weights. Simulations are given that are consistent with this analysis and suggest further generalizations. These results contribute to our understanding of how biologically plausible algorithms might carry out weight learning in a manner different from Hebbian learning, with performance that is comparable with the full non-local backpropagation algorithm.



Parallel Backpropagation for Shared-Feature Visualization

Neural Information Processing Systems

High-level visual brain regions contain subareas in which neurons appear to respond more strongly to examples of a particular semantic category, like faces or bodies, rather than objects. However, recent work has shown that while this finding holds on average, some out-of-category stimuli also activate neurons in these regions.



Bridging Discrete and Backpropagation: Straight-Through and Beyond Liyuan Liu Chengyu Dong Xiaodong Liu Bin Y u Jianfeng Gao Microsoft Research

Neural Information Processing Systems

Backpropagation, the cornerstone of deep learning, is limited to computing gradients for continuous variables. This limitation poses challenges for problems involving discrete latent variables. To address this issue, we propose a novel approach to approximate the gradient of parameters involved in generating discrete latent variables. First, we examine the widely used Straight-Through (ST) heuristic and demonstrate that it works as a first-order approximation of the gradient. Guided by our findings, we propose ReinMax, which achieves second-order accuracy by integrating Heun's method, a second-order numerical method for solving ODEs. ReinMax does not require Hessian or other second-order derivatives, thus having negligible computation overheads. Extensive experimental results on various tasks demonstrate the superiority of ReinMax over the state of the art.


Supplementary Material GAIT-prop: A biologically plausible learning rule derived from backpropagation of error

Neural Information Processing Systems

The GAIT -prop and ITP targets are implemented as a weak perturbation of the forward pass. The table below presents the relevant parameters.Parameter V alue Learning Rate of Adam Optimiser {10 The results report peak and final (end of training) accuracy on the training set (organise'peak / final'). Parameters shown in bold were chosen and used for all results presented in the main paper. We find that target propagation often does best when early-stopping is implemented to'catch' this peak, unlike the other two algorithms which have asymptotic In the main paper, we showed that GAIT -propagation produces networks with final training/test accuracies which are indistinguishable from those produced by backpropagation of error. The performance of deep multi-layer perceptrons trained by BP, and GAIT -prop.