Goto

Collaborating Authors

 Backpropagation


On quantum backpropagation, information reuse, and cheating measurement collapse

arXiv.org Artificial Intelligence

The success of modern deep learning hinges on the ability to train neural networks at scale. Through clever reuse of intermediate information, backpropagation facilitates training through gradient computation at a total cost roughly proportional to running the function, rather than incurring an additional factor proportional to the number of parameters - which can now be in the trillions. Naively, one expects that quantum measurement collapse entirely rules out the reuse of quantum information as in backpropagation. But recent developments in shadow tomography, which assumes access to multiple copies of a quantum state, have challenged that notion. Here, we investigate whether parameterized quantum models can train as efficiently as classical neural networks. We show that achieving backpropagation scaling is impossible without access to multiple copies of a state. With this added ability, we introduce an algorithm with foundations in shadow tomography that matches backpropagation scaling in quantum resources while reducing classical auxiliary computational costs to open problems in shadow tomography. These results highlight the nuance of reusing quantum information for practical purposes and clarify the unique difficulties in training large quantum models, which could alter the course of quantum machine learning.


Using Backpropagation with Temporal Windows to Learn the Dynamics of the CMU Direct-Drive Arm II

Neural Information Processing Systems

Computing the inverse dynamics of a robot ann is an active area of research in the control literature. We hope to learn the inverse dynamics by training a neural network on the measured response of a physical ann. The input to the network is a temporal window of measured positions; output is a vector of torques. We train the network on data measured from the first two joints of the CMU Direct-Drive Arm II as it moves through a randomly-generated sample of "pick-and-place" trajectories. We then test generalization with a new trajectory and compare its output with the torque measured at the physical arm.


Backpropagation and Its Application to Handwritten Signature Verification

Neural Information Processing Systems

A pool of handwritten signatures is used to train a neural net(cid:173) work for the task of deciding whether or not a given signature is a forgery. The network is a feedforward net, with a binary image as input. There is a hidden layer, with a single unit output layer. The weights are adjusted according to the backpropagation algorithm. The signatures are entered into a C software program through the use of a Datacopy Electronic Digitizing Camera.


The Cocktail Party Problem: Speech/Data Signal Separation Comparison between Backpropagation and SONN

Neural Information Processing Systems

This work introduces a new method called Self Organizing Neural Network (SONN) algorithm and compares its performance with Back Propagation in a signal separation application. The problem is to separate two signals; a modem data signal and a male speech signal, added and transmitted through a 4 khz channel. The signals are sam(cid:173) pled at 8 khz, and using supervised learning, an attempt is made to reconstruct them. The SONN is an algorithm that constructs its own network topology during training, which is shown to be much smaller than the BP network, faster to trained, and free from the trial-and(cid:173) error network design that characterize BP.


Asymptotic Convergence of Backpropagation: Numerical Experiments

Neural Information Processing Systems

We have calculated, both analytically and in simulations, the rate of convergence at long times in the backpropagation learning al(cid:173) gorithm for networks with and without hidden units. Our basic finding for units using the standard sigmoid transfer function is lit convergence of the error for large t, with at most logarithmic cor(cid:173) rections for networks with hidden units. Other transfer functions may lead to a 8lower polynomial rate of convergence. Our analytic calculations were presented in (Tesauro, He & Ahamd, 1989). Here we focus in more detail on our empirical measurements of the con(cid:173) vergence rate in numerical simulations, which confirm our analytic results.


Closed-Form Inversion of Backpropagation Networks: Theory and Optimization Issues

Neural Information Processing Systems

We describe a closed-form technique for mapping the output of a trained backpropagation network int.o input activity space. The mapping is an in(cid:173) verse mapping in the sense that, when the image of the mapping in input activity space is propagat.ed When more than one such inverse mappings exist, our inverse ma.pping is special in that it has no projection onto the nullspace of the activation flow opera(cid:173) tor for the entire network. An important by-product of our calculation, when more than one invel'se mappings exist, is an orthogonal basis set of a significant portion of the activation flow operator nullspace. This basis set can be used to obtain an alternate inverse mapping that is optimized for a particular rea.l-world application.


Kernel Regression and Backpropagation Training With Noise

Neural Information Processing Systems

One method proposed for improving the generalization capability of a feed(cid:173) forward network trained with the backpropagation algorithm is to use artificial training vectors which are obtained by adding noise to the orig(cid:173) inal training vectors. We discuss the connection of such backpropagation training with noise to kernel density and kernel regression estimation. We compare by simulated examples (1) backpropagation, (2) backpropagation with noise, and (3) kernel regression in mapping estimation and pattern classification contexts.


Diffusion Approximations for the Constant Learning Rate Backpropagation Algorithm and Resistence to Local Minima

Neural Information Processing Systems

In this paper we discuss the asymptotic properties of the most com(cid:173) monly used variant of the backpropagation algorithm in which net(cid:173) work weights are trained by means of a local gradient descent on ex(cid:173) amples drawn randomly from a fixed training set, and the learning rate TJ of the gradient updates is held constant (simple backpropa(cid:173) gation). Using stochastic approximation results, we show that for TJ 0 this training process approaches a batch training and pro(cid:173) vide results on the rate of convergence. Further, we show that for small TJ one can approximate simple back propagation by the sum of a batch training process and a Gaussian diffusion which is the unique solution to a linear stochastic differential equation. Using this approximation we indicate the reasons why simple backprop(cid:173) agation is less likely to get stuck in local minima than the batch training process and demonstrate this empirically on a number of examples.


Bayesian Backpropagation Over I-O Functions Rather Than Weights

Neural Information Processing Systems

The conventional Bayesian justification of backprop is that it finds the MAP weight vector. As this paper shows, to find the MAP i-o function instead one must add a correction tenn to backprop.


Credit Assignment through Time: Alternatives to Backpropagation

Neural Information Processing Systems

Learning to recognize or predict sequences using long-term con(cid:173) text has many applications. However, practical and theoretical problems are found in training recurrent neural networks to per(cid:173) form tasks in which input/output dependencies span long intervals. Starting from a mathematical analysis of the problem, we consider and compare alternative algorithms and architectures on tasks for which the span of the input/output dependencies can be controlled. Results on the new algorithms show performance qualitatively su(cid:173) perior to that obtained with backpropagation.