Backpropagation
Reinforcement Learning Without Backpropagation or a Clock
Kostas, James, Nota, Chris, Thomas, Philip S.
Reinforcement learning (RL) algorithms share qualitative similarities with the algorithms implemented byanimal brains. However, there remain clear differences between these two types of algorithms. For example, while RL algorithms using artificial neural networks require information to flow backwards through the network via the backpropagation algorithm, there is currently debate about whether this is feasible in biological neural implementations (Werbos and Davis, 2016). Policy gradient coagent networks (PGCNs) are a class of RL algorithms that were introduced to remove this possibly biologically implausible property of RL algorithms--they use artificial neural networks but do not use the backpropagation algorithm (Thomas, 2011). Since their introduction, PGCN algorithms have proven to be not only a possible improvement in biological plausibility, but a practical tool for improving RL agents. They were used to solve RL problems with high-dimensional action spaces (Thomas and Barto, 2012), are the RL precursor to the more general stochastic computation graphs (Schulman et al., 2015), and, as we will show in this paper, generalize the recently proposed option-critic architecture (Bacon et al., 2017), while drastically simplifying key derivations.
Learning Undirected Posteriors by Backpropagation through MCMC Updates
Vahdat, Arash, Andriyash, Evgeny, Macready, William G.
The representation of the posterior is a critical aspect of effective variational autoencoders (VAEs). Poor choices for the posterior have a detrimental impact on the generative performance of VAEs due to the mismatch with the true posterior. We extend the class of posterior models that may be learned by using undirected graphical models. We develop an efficient method to train undirected posteriors by showing that the gradient of the training objective with respect to the parameters of the undirected posterior can be computed by backpropagation through Markov chain Monte Carlo updates. We apply these gradient estimators for training discrete VAEs with Boltzmann machine posteriors and demonstrate that undirected models outperform previous results obtained using directed graphical models as posteriors.
The Backpropagation Algorithm Demystified
The first thing people think of when they hear the term "Machine Learning" goes a little something like the Matrix. All around, there are computers taking over the world, let alone the human race. In any case, people generally just want nothing to do with it. What if I told you those people don't even know what machine learning and things like backpropagation really are? Then you can go back to worrying about the robot-led apocalypse that's supposed to happen next Friday.
Dendritic cortical microcircuits approximate the backpropagation algorithm
Sacramento, Joรฃo, Costa, Rui Ponte, Bengio, Yoshua, Senn, Walter
Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience. However, the main learning mechanism behind these advances โ error backpropagation โ appears to be at odds with neurobiology. Here, we introduce a multilayer neuronal network model with simplified dendritic compartments in which error-driven synaptic plasticity adapts the network towards a global desired output. In contrast to previous work our model does not require separate phases and synaptic learning is driven by local dendritic prediction errors continuously in time. Such errors originate at apical dendrites and occur due to a mismatch between predictive input from lateral interneurons and activity from actual top-down feedback. Through the use of simple dendritic compartments and different cell-types our model can represent both error and normal activity within a pyramidal neuron. We demonstrate the learning capabilities of the model in regression and classification tasks, and show analytically that it approximates the error backpropagation algorithm. Moreover, our framework is consistent with recent observations of learning between brain areas and the architecture of cortical microcircuits. Overall, we introduce a novel view of learning on dendritic cortical circuits and on how the brain may solve the long-standing synaptic credit assignment problem.
Hybrid Macro/Micro Level Backpropagation for Training Deep Spiking Neural Networks
Jin, Yingyezhe, Zhang, Wenrui, Li, Peng
Spiking neural networks (SNNs) are positioned to enable spatio-temporal information processing and ultra-low power event-driven neuromorphic hardware. However, SNNs are yet to reach the same performances of conventional deep artificial neural networks (ANNs), a long-standing challenge due to complex dynamics and non-differentiable spike events encountered in training. The existing SNN error backpropagation (BP) methods are limited in terms of scalability, lack of proper handling of spiking discontinuities, and/or mismatch between the rate-coded loss function and computed gradient. We present a hybrid macro/micro level backpropagation (HM2-BP) algorithm for training multi-layer SNNs. The temporal effects are precisely captured by the proposed spike-train level post-synaptic potential (S-PSP) at the microscopic level. The rate-coded errors are defined at the macroscopic level, computed and back-propagated across both macroscopic and microscopic levels. Different from existing BP methods, HM2-BP directly computes the gradient of the rate-coded loss function w.r.t tunable parameters. We evaluate the proposed HM2-BP algorithm by training deep fully connected and convolutional SNNs based on the static MNIST [14] and dynamic neuromorphic N-MNIST [26]. HM2-BP achieves an accuracy level of 99.49% and 98.88% for MNIST and N-MNIST, respectively, outperforming the best reported performances obtained from the existing SNN BP algorithms. Furthermore, the HM2-BP produces the highest accuracies based on SNNs for the EMNIST [3] dataset, and leads to high recognition accuracy for the 16-speaker spoken English letters of TI46 Corpus [16], a challenging patio-temporal speech recognition benchmark for which no prior success based on SNNs was reported. It also achieves competitive performances surpassing those of conventional deep learning models when dealing with asynchronous spiking streams.
Hybrid Macro/Micro Level Backpropagation for Training Deep Spiking Neural Networks
Jin, Yingyezhe, Zhang, Wenrui, Li, Peng
Spiking neural networks (SNNs) are positioned to enable spatio-temporal information processing and ultra-low power event-driven neuromorphic hardware. However, SNNs are yet to reach the same performances of conventional deep artificial neural networks (ANNs), a long-standing challenge due to complex dynamics and non-differentiable spike events encountered in training. The existing SNN error backpropagation (BP) methods are limited in terms of scalability, lack of proper handling of spiking discontinuities, and/or mismatch between the rate-coded loss function and computed gradient. We present a hybrid macro/micro level backpropagation (HM2-BP) algorithm for training multi-layer SNNs. The temporal effects are precisely captured by the proposed spike-train level post-synaptic potential (S-PSP) at the microscopic level. The rate-coded errors are defined at the macroscopic level, computed and back-propagated across both macroscopic and microscopic levels. Different from existing BP methods, HM2-BP directly computes the gradient of the rate-coded loss function w.r.t tunable parameters. We evaluate the proposed HM2-BP algorithm by training deep fully connected and convolutional SNNs based on the static MNIST [14] and dynamic neuromorphic N-MNIST [26]. HM2-BP achieves an accuracy level of 99.49% and 98.88% for MNIST and N-MNIST, respectively, outperforming the best reported performances obtained from the existing SNN BP algorithms. Furthermore, the HM2-BP produces the highest accuracies based on SNNs for the EMNIST [3] dataset, and leads to high recognition accuracy for the 16-speaker spoken English letters of TI46 Corpus [16], a challenging patio-temporal speech recognition benchmark for which no prior success based on SNNs was reported. It also achieves competitive performances surpassing those of conventional deep learning models when dealing with asynchronous spiking streams.
Dendritic cortical microcircuits approximate the backpropagation algorithm
Sacramento, Joรฃo, Costa, Rui Ponte, Bengio, Yoshua, Senn, Walter
Deep learning has seen remarkable developments over the last years, many of them inspired by neuroscience. However, the main learning mechanism behind these advances โ error backpropagation โ appears to be at odds with neurobiology. Here, we introduce a multilayer neuronal network model with simplified dendritic compartments in which error-driven synaptic plasticity adapts the network towards a global desired output. In contrast to previous work our model does not require separate phases and synaptic learning is driven by local dendritic prediction errors continuously in time. Such errors originate at apical dendrites and occur due to a mismatch between predictive input from lateral interneurons and activity from actual top-down feedback. Through the use of simple dendritic compartments and different cell-types our model can represent both error and normal activity within a pyramidal neuron. We demonstrate the learning capabilities of the model in regression and classification tasks, and show analytically that it approximates the error backpropagation algorithm. Moreover, our framework is consistent with recent observations of learning between brain areas and the architecture of cortical microcircuits. Overall, we introduce a novel view of learning on dendritic cortical circuits and on how the brain may solve the long-standing synaptic credit assignment problem.
Why ReLU Units Sometimes Die: Analysis of Single-Unit Error Backpropagation in Neural Networks
Douglas, Scott C., Yu, Jiutian
Recently, neural networks in machine learning use rectified linear units (ReLUs) in early processing layers for better performance. Training these structures sometimes results in "dying ReLU units" with near-zero outputs. We first explore this condition via simulation using the CIFAR-10 dataset and variants of two popular convolutive neural network architectures. Our explorations show that the output activation probability Pr[y>0] is generally less than 0.5 at system convergence for layers that do not employ skip connections, and this activation probability tends to decrease as one progresses from input layer to output layer. Employing a simplified model of a single ReLU unit trained by a variant of error backpropagation, we then perform a statistical convergence analysis to explore the model's evolutionary behavior. Our analysis describes the potentially-slower convergence speeds of dying ReLU units, and this issue can occur regardless of how the weights are initialized.
Training Multilayer Spiking Neural Networks using NormAD based Spatio-Temporal Error Backpropagation
Anwani, Navin, Rajendran, Bipin
Spiking neural networks (SNNs) have garnered a great amount of interest for supervised and unsupervised learning applications. This paper deals with the problem of training multilayer feedforward SNNs. The non-linear integrate-and-fire dynamics employed by spiking neurons make it difficult to train SNNs to output desired spike train in response to a given input. To tackle this, first the problem of training a multilayer SNN is formulated as an optimization problem such that its objective function is based on the deviation in membrane potential rather than the spike arrival instants. Then, an optimization method named Normalized Approximate Descent (NormAD), hand-crafted for such non-convex optimization problems, is employed to derive the iterative synaptic weight update rule. Next it is reformulated for a more efficient implementation, which can also be interpreted to be spatio-temporal error backpropagation. The learning rule is validated by employing it to solve generic spike based training problem as well as a spike based formulation of the XOR problem. Thus, the new algorithm is a key step towards building deep spiking neural networks capable of event-triggered learning.
Learning sparse transformations through backpropagation
Many transformations in deep learning architectures are sparsely connected. When such transformations cannot be designed by hand, they can be learned, even through plain backpropagation, for instance in attention mechanisms. However, during learning, such sparse structures are often represented in a dense form, as we do not know beforehand which elements will eventually become non-zero. We introduce the adaptive, sparse hyperlayer, a method for learning a sparse transformation, paramatrized sparsely: as index-tuples with associated values. To overcome the lack of gradients from such a discrete structure, we introduce a method of randomly sampling connections, and backpropagating over the randomly wired computation graph. To show that this approach allows us to train a model to competitive performance on real data, we use it to build two architectures. First, an attention mechanism for visual classification. Second, we implement a method for differentiable sorting: specifically, learning to sort unlabeled MNIST digits, given only the correct order.