Convergence of energy-based learning in linear resistive networks

Huijzer, Anne-Men, Chaffey, Thomas, Besselink, Bart, van Waarde, Henk J.

arXiv.org Artificial Intelligence 

-- Energy-based learning algorithms are alternatives to backpropagation and are well-suited to distributed implementations in analog electronic devices. However, a rigorous theory of convergence is lacking. We make a first step in this direction by analysing a particular energy-based learning algorithm, Contrastive Learning, applied to a network of linear adjustable resistors. It is shown that, in this setup, Contrastive Learning is equivalent to projected gradient descent on a convex function, for any step size, giving a guarantee of convergence for the algorithm. Backpropagation is the most popular method of training artificial neural networks. However, while artificial neural networks are inspired by biological nervous systems, it has long been observed that backpropagation is not biologically plausible [1]-[3]. Several biologically plausible alternatives to backpropagation have been proposed in the literature, among them so-called energy-based learning algorithms [4]- [11]. These algorithms apply to energy-based models, which come equipped with some generalized notion of energy, and associate to each input a minimum of this energy. The basic idea is to probe the system in two states, one free and one clamped, or dictated by the training data, and use the energy difference between these states as a cost function. An iterative procedure is then applied to minimise this cost function. Several clamping mechanisms and iterative procedures have been defined, among them Contrastive Learning [4], [5], [12], Equilibrium Propagation [7], Coupled Learning [9] and Temporal Contrastive Learning [13]. These algorithms all resemble gradient descent, where the gradient of the cost function is replaced by a gradient-like quantity which may be computed in a distributed manner across a network. The energy-based learning paradigm is particularly suited to learning in analog electronic devices, as they have a natural notion of generalized energy: the heat dissipated by electrical resistance (in this case, a power rather than energy). M. A. Huijzer, B. Besselink, and H.J. van Waarde are with the Bernoulli Institute for Mathematics, Computer Science, and Artificial Intelligence, University of Groningen, Groningen, The Netherlands; email: m.a.huijzer@rug.nl; Chaffey was with the Control Group, Department of Engineering, University of Cambridge, UK, and is now with the School of Electrical and Computer Engineering, University of Sydney, Australia; email: thomas.chaffey@sydney.edu.au. This is, in part, due to the ability of analog circuits to perform inference many times faster than conventional neural networks [20]-[22].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found