Backpropagation
[Discussion] What are the problems of the backpropagation algorithm? • r/MachineLearning
Two days ago, an article quoting Hinton who was saying that backprop is not necessarily the way to go for AI, generated lots of very cool discussion on this sub-reddit (here). The discussion mainly went in the direction of asking what are alternatives to backprop. In this discussion I would like us to answer the question: what are the problems of backprop? Here's my initial input: Problems that seem to be intrinsic to backprop: Problems that currently pose lots of difficulties and we're not sure are possible with backprop: Do you agree with these? What other problems have you noticed during your work?
Spatio-Temporal Backpropagation for Training High-performance Spiking Neural Networks
Wu, Yujie, Deng, Lei, Li, Guoqi, Zhu, Jun, Shi, Luping
Compared with artificial neural networks (ANNs), spiking neural networks (SNNs) are promising to explore the brain-like behaviors since the spikes could encode more spatio-temporal information. Although pre-training from ANN or direct training based on backpropagation (BP) makes the supervised training of SNNs possible, these methods only exploit the networks' spatial domain information which leads to the performance bottleneck and requires many complicated training skills. Another fundamental issue is that the spike activity is naturally non-differentiable which causes great difficulties in training SNNs. To this end, we build an iterative LIF model that is more friendly for gradient descent training. By simultaneously considering the layer-by-layer spatial domain (SD) and the timing-dependent temporal domain (TD) in the training phase, as well as an approximated derivative for the spike activity, we propose a spatio-temporal backpropagation (STBP) training framework without using any complicated technology. We achieve the best performance of multi-layered perceptron (MLP) compared with existing state-of-the-art algorithms over the static MNIST and the dynamic N-MNIST dataset as well as a custom object detection dataset. This work provides a new perspective to explore the high-performance SNNs for future brain-like computing paradigm with rich spatio-temporal dynamics.
General Backpropagation Algorithm for Training Second-order Neural Networks
Fan, Fenglei, Cong, Wenxiang, Wang, Ge
The artificial neural network is a popular framework in machine learning. To empower individual neurons, we recently suggested that the current type of neurons could be upgraded to 2nd order counterparts, in which the linear operation between inputs to a neuron and the associated weights is replaced with a nonlinear quadratic operation. A single 2nd order neurons already has a strong nonlinear modeling ability, such as implementing basic fuzzy logic operations. In this paper, we develop a general backpropagation (BP) algorithm to train the network consisting of 2nd-order neurons. The numerical studies are performed to verify of the generalized BP algorithm.
Learning Local Feature Aggregation Functions with Backpropagation
Katharopoulos, Angelos, Paschalidou, Despoina, Diou, Christos, Delopoulos, Anastasios
Abstract--This paper introduces a family of local feature aggregation functions and a novel method to estimate their parameters, such that they generate optimal representations for classification (or any task that can be expressed as a cost function minimization problem). T o achieve that, we compose the local feature aggregation function with the classifier cost function and we backpropagate the gradient of this cost function in order to update the local feature aggregation function parameters. Experiments on synthetic datasets indicate that our method discovers parameters that model the class-relevant information in addition to the local feature space. Further experiments on a variety of motion and visual descriptors, both on image and video datasets, show that our method outperforms other state-of- the-art local feature aggregation functions, such as Bag of Words, Fisher V ectors and VLAD, by a large margin. A typical image or video classification pipeline, which uses handcrafted features, consists of the following components: local feature extraction (e.g.
[R] "Unbiasing Truncated Backpropagation Through Time", Tallec & Ollivier 2017 • r/MachineLearning
The big point here is that we are improving the optimization approach by adding clever noise into the gradient. By sampling different truncation lengths, the gradient estimate we obtain becomes stochastic. It doesn't come as much of a surprise that adding noise does slow down the training procedure. However, as mentionned, the noise we introduce is not any noise: it provides unbiasedness. Notably, this means that ARTBP considers some minima that Truncated Backprop does not see as minima, as it is biased.
Fast Second-Order Stochastic Backpropagation for Variational Inference
Fan, Kai, Wang, Ziteng, Beck, Jeff, Kwok, James, Heller, Katherine
We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparameterization trick with lower complexity. As an illustrative example, we apply this approach to the problems of Bayesian logistic regression and variational auto-encoder (VAE). Additionally, we compute bounds on the estimator variance of intractable expectations for the family of Lipschitz continuous function. Our method is practical, scalable and model free. We demonstrate our method on several real-world datasets and provide comparisons with other stochastic gradient methods to show substantial enhancement in convergence rates.
Application of backpropagation neural networks to both stages of fingerprinting based WIPS
We propose a scheme to employ backpropagation neural networks (BPNNs) for both stages of fingerprinting-based indoor positioning using WLAN/WiFi signal strengths (FWIPS): radio map construction during the offline stage, and localization during the online stage. Given a training radio map (TRM), i.e., a set of coordinate vectors and associated WLAN/WiFi signal strengths of the available access points, a BPNN can be trained to output the expected signal strengths for any input position within the region of interest (BPNN-RM). This can be used to provide a continuous representation of the radio map and to filter, densify or decimate a discrete radio map. Correspondingly, the TRM can also be used to train another BPNN to output the expected position within the region of interest for any input vector of recorded signal strengths and thus carry out localization (BPNN-LA).Key aspects of the design of such artificial neural networks for a specific application are the selection of design parameters like the number of hidden layers and nodes within the network, and the training procedure. Summarizing extensive numerical simulations, based on real measurements in a testbed, we analyze the impact of these design choices on the performance of the BPNN and compare the results in particular to those obtained using the $k$ nearest neighbors ($k$NN) and weighted $k$ nearest neighbors approaches to FWIPS.
Neural Network Gradients: Backpropagation, Dual Numbers, Finite Differences
In the post How to Train Neural Networks With Backpropagation I said that you could also calculate the gradient of a neural network by using dual numbers or finite differences. The post I already linked to explains backpropagation. Since the fundamentals are explained in the links above, we'll go straight to the code. We'll be getting the gradient (learning values) for the network in example 4 in the backpropagation post: Note that I am using "central differences" for the gradient, but it would be more efficient to do a forward or backward difference, at the cost of some accuracy. I didn't compare the running times of each method as my code is meant to be readable, not fast, and the code isn't doing enough work to make a meaningful performance test IMO.