Goto

Collaborating Authors

 Backpropagation


Associated Learning: Decomposing End-to-end Backpropagation based on Auto-encoders and Target Propagation

arXiv.org Machine Learning

Backpropagation has been widely used in deep learning approaches, but it is inefficient and sometimes unstable because of backward locking and vanishing/exploding gradient problems, especially when the gradient flow is long. Additionally, updating all edge weights based on a single objective seems biologically implausible. In this paper, we introduce a novel biologically motivated learning structure called Associated Learning, which modularizes the network into smaller components, each of which has a local objective. Because the objectives are mutually independent, Associated Learning can learn the parameters independently and simultaneously when these parameters belong to different components. Surprisingly, training deep models by Associated Learning yields comparable accuracies to models trained using typical backpropagation methods, which aims at fitting the target variable directly. Moreover, probably because the gradient flow of each component is short, deep networks can still be trained with Associated Learning even when some of the activation functions are sigmoid-a situation that usually results in the vanishing gradient problem when using typical backpropagation. We also found that the Associated Learning generates better metafeatures, which we demonstrated both quantitatively (via inter-class and intra-class distance comparisons in the hidden layers) and qualitatively (by visualizing the hidden layers using t-SNE).


Memorized Sparse Backpropagation

arXiv.org Machine Learning

Neural network learning is typically slow since backpropagation needs to compute full gradients and backpropagate them across multiple layers. Despite its success of existing work in accelerating propagation through sparseness, the relevant theoretical characteristics remain unexplored and we empirically find that they suffer from the loss of information contained in unpropagated gradients. To tackle these problems, in this work, we present a unified sparse backpropagation framework and provide a detailed analysis of its theoretical characteristics. Analysis reveals that when applied to a multilayer perceptron, our framework essentially performs gradient descent using an estimated gradient similar enough to the true gradient, resulting in convergence in probability under certain conditions. Furthermore, a simple yet effective algorithm named memorized sparse backpropagation (MSBP) is proposed to remedy the problem of information loss by storing unpropagated gradients in memory for the next learning. The experiments demonstrate that the proposed MSBP is able to effectively alleviate the information loss in traditional sparse backpropagation while achieving comparable acceleration.


A Graph Theoretic Framework of Recomputation Algorithms for Memory-Efficient Backpropagation

arXiv.org Machine Learning

Recomputation algorithms collectively refer to a family of methods that aims to reduce the memory consumption of the backpropagation by selectively discarding the intermediate results of the forward propagation and recomputing the discarded results as needed. In this paper, we will propose a novel and efficient recomputation method that can be applied to a wider range of neural nets than previous methods. We use the language of graph theory to formalize the general recomputation problem of minimizing the computational overhead under a fixed memory budget constraint, and provide a dynamic programming solution to the problem. Our method can reduce the peak memory consumption on various benchmark networks by 36% 81%, which outperforms the reduction achieved by other methods.


Synthesizing Images from Spatio-Temporal Representations using Spike-based Backpropagation

arXiv.org Machine Learning

Spiking neural networks (SNNs) offer a promising alternative to current artificial neural networks to enable low-power event-driven neuromorphic hardware. Spike-based neuromorphic applications require processing and extracting meaningful information from spatio-temporal data, represented as series of spike trains over time. In this paper, we propose a method to synthesize images from multiple modalities in a spike-based environment. We use spiking auto-encoders to convert image and audio inputs into compact spatio-temporal representations that is then decoded for image synthesis. For this, we use a direct training algorithm that computes loss on the membrane potential of the output layer and back-propagates it by using a sigmoid approximation of the neuron's activation function to enable differentiability. The spiking autoencoders are benchmarked on MNIST and Fashion-MNIST and achieve very low reconstruction loss, comparable to ANNs. Then, spiking autoencoders are trained to learn meaningful spatio-temporal representations of the data, across the two modalities - audio and visual. We synthesize images from audio in a spike-based environment by first generating, and then utilizing such shared multi-modal spatio-temporal representations. Our audio to image synthesis model is tested on the task of converting TI-46 digits audio samples to MNIST images. We are able to synthesize images with high fidelity and the model achieves competitive performance against ANNs.


Adaptively Truncating Backpropagation Through Time to Control Gradient Bias

arXiv.org Machine Learning

Truncated backpropagation through time (TBPTT) is a popular method for learning in recurrent neural networks (RNNs) that saves computation and memory at the cost of bias by truncating backpropagation after a fixed number of lags. In practice, choosing the optimal truncation length is difficult: TBPTT will not converge if the truncation length is too small, or will converge slowly if it is too large. We propose an adaptive TBPTT scheme that converts the problem from choosing a temporal lag to one of choosing a tolerable amount of gradient bias. For many realistic RNNs, the TBPTT gradients decay geometrically for large lags; under this condition, we can control the bias by varying the truncation length adaptively. For RNNs with smooth activation functions, we prove that this bias controls the convergence rate of SGD with biased gradients for our non-convex loss. Using this theory, we develop a practical method for adaptively estimating the truncation length during training. We evaluate our adaptive TBPTT method on synthetic data and language modeling tasks and find that our adaptive TBPTT ameliorates the computational pitfalls of fixed TBPTT.


ARSM: Augment-REINFORCE-Swap-Merge Estimator for Gradient Backpropagation Through Categorical Variables

arXiv.org Machine Learning

To address the challenge of backpropagating the gradient through categorical variables, we propose the augment-REINFORCE-swap-merge (ARSM) gradient estimator that is unbiased and has low variance. ARSM first uses variable augmentation, REINFORCE, and Rao-Blackwellization to re-express the gradient as an expectation under the Dirichlet distribution, then uses variable swapping to construct differently expressed but equivalent expectations, and finally shares common random numbers between these expectations to achieve significant variance reduction. Experimental results show ARSM closely resembles the performance of the true gradient for optimization in univariate settings; outperforms existing estimators by a large margin when applied to categorical variational auto-encoders; and provides a "try-and-see self-critic" variance reduction method for discrete-action policy gradient, which removes the need of estimating baselines by generating a random number of pseudo actions and estimating their action-value functions.


Embodied Event-Driven Random Backpropagation

arXiv.org Artificial Intelligence

Spike-based communication between biological neurons is sparse and unreliable. This enables the brain to process visual information from the eyes efficiently. Taking inspiration from biology, artificial spiking neural networks coupled with silicon retinas attempt to model these computations. Recent findings in machine learning allowed the derivation of a family of powerful synaptic plasticity rules approximating backpropagation for spiking networks. Are these rules capable of processing real-world visual sensory data? In this paper, we evaluate the performance of Event-Driven Random Backpropagation (eRBP) at learning representations from event streams provided by a Dynamic Vision Sensor (DVS). First, we show that eRBP matches state-of-the-art performance on DvsGesture with the addition of a simple covert attention mechanism. By remapping visual receptive fields relatively to the center of the motion, this attention mechanism provides translation invariance at low computational cost compared to convolutions. Second, we successfully integrate eRBP in a real robotic setup, where a robotic arm grasps objects with respect to detected visual affordances. In this setup, visual information is actively sensed by a DVS mounted on a robotic head performing microsaccadic eye movements. We show that our method quickly classifies affordances within 100ms after microsaccade onset, comparable to human performance reported in behavioral study. Our results suggest that advances in neuromorphic technology and plasticity rules enable the development of autonomous robots operating at high speed and low energy budget.


The Backpropagation Algorithm Demystified โ€“ Nathalie Jeans โ€“ Medium

#artificialintelligence

One of the most important aspects of machine learning is its ability to recognize error margins in its output and be able to interpret data more precisely as increasing numbers of datasets are fed through its neural network. Commonly referred to as backpropagation, it is a process that isn't as complex as you might think. The first thing people think of when they hear the term "Machine Learning" goes a little something like the Matrix. All around, there are computers taking over the world, let alone the human race. In any case, people generally just want nothing to do with it. What if I told you those people don't even know what machine learning and things like backpropagation really are?


Deep Learning โ€“ Backpropagation Algorithm Basics Vinod Sharma's Blog

#artificialintelligence

This algorithm uses supervised learning methods for training Artificial Neural Networks. The whole idea of training multi-layer perceptrons is to compute the derivatives of the error function or gradient descent with respect to weights using the backpropagation algorithm. This algorithm is actually based on the linear algebraic operation with a goal of optimising error function by harnessing its intelligence and provisioning updates. In this post, we will focus on backpropagation and basic details around it on a high level in simple English. As mentioned above "Backpropagation" is an algorithm which uses supervised learning methods to compute the gradient descent (delta rule) with respect to weights.


8 Tricks for Configuring Backpropagation to Train Better Neural Networks

#artificialintelligence

This section provides more resources on the topic if you are looking to go deeper. In this post, you discovered tips and tricks for getting the most out of the backpropagation algorithm when training neural network models. Have you tried any of these tricks on your projects? Let me know about your results in the comments below. Do you have any questions? Ask your questions in the comments below and I will do my best to answer.