Goto

Collaborating Authors

 Backpropagation


Decoupled Parallel Backpropagation with Convergence Guarantee

arXiv.org Machine Learning

Backpropagation algorithm is indispensable for the training of feedforward neural networks. It requires propagating error gradients sequentially from the output layer all the way back to the input layer. The backward locking in backpropagation algorithm constrains us from updating network layers in parallel and fully leveraging the computing resources. Recently, several algorithms have been proposed for breaking the backward locking. However, their performances degrade seriously when networks are deep. In this paper, we propose decoupled parallel backpropagation algorithm for deep learning optimization with convergence guarantee. Firstly, we decouple the backpropagation algorithm using delayed gradients, and show that the backward locking is removed when we split the networks into multiple modules. Then, we utilize decoupled parallel backpropagation in two stochastic methods and prove that our method guarantees convergence to critical points for the non-convex problem. Finally, we perform experiments for training deep convolutional neural networks on benchmark datasets. The experimental results not only confirm our theoretical analysis, but also demonstrate that the proposed method can achieve significant speedup without loss of accuracy.


Backpropagation on a convolutional layer

#artificialintelligence

Online tutorials describe in depth the convolution of an image with a filter, etc; However, I have not seen one that describes the backpropagation on the filter (at least visually). First let me try to explain how I understand backpropagation on a fully connected network. The last partial derivative is the most interesting one in this case ... and it is equal to the value of the first input (Single value). The original question was how does one perform backpropagation on a convolutional layer - for example $$\frac{\partial Error}{\partial W_1}?$$ The convolutional layer as described online.


On Using Backpropagation for Speech Texture Generation and Voice Conversion

arXiv.org Machine Learning

Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source and target utterances. Similar to image texture synthesis and neural style transfer, the system works by optimizing a cost function with respect to the input waveform samples. To this end we use a differentiable mel-filterbank feature extraction pipeline and train a convolutional CTC speech recognition network. Our system is able to extract speaker characteristics from very limited amounts of target speaker data, as little as a few seconds, and can be used to generate realistic speech babble or reconstruct an utterance in a different voice.


On Blackbox Backpropagation and Jacobian Sensing

Neural Information Processing Systems

From a small number of calls to a given “blackbox" on random input perturbations, we show how to efficiently recover its unknown Jacobian, or estimate the left action of its Jacobian on a given vector. Our methods are based on a novel combination of compressed sensing and graph coloring techniques, and provably exploit structural prior knowledge about the Jacobian such as sparsity and symmetry while being noise robust. We demonstrate efficient backpropagation through noisy blackbox layers in a deep neural net, improved data-efficiency in the task of linearizing the dynamics of a rigid body system, and the generic ability to handle a rich class of input-output dependency structures in Jacobian estimation problems.


A Visual Explanation of the Back Propagation Algorithm for Neural Networks

@machinelearnbot

Let's assume we are really into mountain climbing, and to add a little extra challenge, we cover eyes this time so that we can't see where we are and when we accomplished our "objective," that is, reaching the top of the mountain. Since we can't see the path upfront, we let our intuition guide us: assuming that the mountain top is the "highest" point of the mountain, we think that the steepest path leads us to the top most efficiently. We approach this challenge by iteratively "feeling" around you and taking a step into the direction of the steepest ascent -- let's call it "gradient ascent." But what do we do if we reach a point where we can't ascent any further? I.e., each direction leads downwards?


Learning in the Machine: Random Backpropagation and the Deep Learning Channel

arXiv.org Artificial Intelligence

Random backpropagation (RBP) is a variant of the backpropagation algorithm for training neural networks, where the transpose of the forward matrices are replaced by fixed random matrices in the calculation of the weight updates. It is remarkable both because of its effectiveness, in spite of using random matrices to communicate error information, and because it completely removes the taxing requirement of maintaining symmetric weights in a physical neural system. To better understand random backpropagation, we first connect it to the notions of local learning and learning channels. Through this connection, we derive several alternatives to RBP, including skipped RBP (SRPB), adaptive RBP (ARBP), sparse RBP, and their combinations (e.g. ASRBP) and analyze their computational complexity. We then study their behavior through simulations using the MNIST and CIFAR-10 bechnmark datasets. These simulations show that most of these variants work robustly, almost as well as backpropagation, and that multiplication by the derivatives of the activation functions is important. As a follow-up, we study also the low-end of the number of bits required to communicate error information over the learning channel. We then provide partial intuitive explanations for some of the remarkable properties of RBP and its variations. Finally, we prove several mathematical results, including the convergence to fixed points of linear chains of arbitrary length, the convergence to fixed points of linear autoencoders with decorrelated data, the long-term existence of solutions for linear systems with a single hidden layer and convergence in special cases, and the convergence to fixed points of non-linear chains, when the derivative of the activation functions is included.


Neural Network Foundations, Explained: Updating Weights with Gradient Descent & Backpropagation

@machinelearnbot

Recall that in order for a neural networks to learn, weights associated with neuron connections must be updated after forward passes of data through the network. These weights are adjusted to help reconcile the differences between the actual and predicted outcomes for subsequent forward passes. But how, exactly, do the weights get adjusted? Before we get to the actual adjustments, think of what would be needed at each neuron in order to make a meaningful change to a given weight. Since we are talking about the difference between actual and predicted values, the error would be a useful measure here, and so each neuron will require that their respective error be sent backward through the network to them in order to facilitate the update process; hence, backpropagation of error.


Why we should be Deeply Suspicious of BackPropagation

#artificialintelligence

Geoffrey Hinton has finally expressed what many have been uneasy about. In a recent AI conference, Hinton remarked that he was "deeply suspicious" of back-propagation, and said "My view is throw it all away and start again." Backpropagation has become the bread and butter mechanism for Deep Learning. Researchers had discovered that one can employ any computation layer in a solution with the only requirement being that the layer must be differentiable. Said differently, that one is able to calculate the gradient of layer.


Neural Networks: The Backpropagation algorithm in a picture - DeepMarketing.io

#artificialintelligence

Neural Networks: The Backpropagation algorithm in a picture [unable to retrieve full-text content] Here I present the backpropagation algorithm for a continuous target variable and no activation function in hidden layer: although simpler than the one used for the logistic cost function, it's a proficuous field for math lovers.…


Is it possible to train a neural network without backpropagation?

@machinelearnbot

The first two algorithms you mention (Nelder-Mead and Simulated Annealing) are generally considered pretty much obsolete in optimization circles, as there are much better alternatives which are both more reliable and less costly. Genetic algorithms covers a wide range, and some of these can be reasonable. However, in the broader class of derivative-free optimization algorithms, there are many which are significantly better than these "classics", as this has been an active area of research in recent decades. So, might some of these newer approaches be reasonable for deep learning? This is a nice paper which has many interesting insights into recent techniques.