Backpropagation
Training all-mechanical neural networks for task learning through in situ backpropagation
Recent advances unveiled physical neural networks as promising machine learning platforms, offering faster and more energy-efficient information processing. Compared with extensively-studied optical neural networks, the development of mechanical neural networks (MNNs) remains nascent and faces significant challenges, including heavy computational demands and learning with approximate gradients. Here, we introduce the mechanical analogue of in situ backpropagation to enable highly efficient training of MNNs. We demonstrate that the exact gradient can be obtained locally in MNNs, enabling learning through their immediate vicinity. With the gradient information, we showcase the successful training of MNNs for behavior learning and machine learning tasks, achieving high accuracy in regression and classification. Furthermore, we present the retrainability of MNNs involving task-switching and damage, demonstrating the resilience. Our findings, which integrate the theory for training MNNs and experimental and numerical validations, pave the way for mechanical machine learning hardware and autonomous self-learning material systems.
On the relationship between predictive coding and backpropagation
Artificial neural networks are often interpreted as abstract models of biological neuronal networks, but they are typically trained using the biologically unrealistic backpropagation algorithm and its variants. Predictive coding has been proposed as a potentially more biologically realistic alternative to backpropagation for training neural networks. This manuscript reviews and extends recent work on the mathematical relationship between predictive coding and backpropagation for training feedforward artificial neural networks on supervised learning tasks. Implications of these results for the interpretation of predictive coding and deep neural networks as models of biological learning are discussed along with a repository of functions, Torch2PC, for performing predictive coding with PyTorch neural network models.
Time Series Compression using Quaternion Valued Neural Networks and Quaternion Backpropagation
Pöppelbaum, Johannes, Schwung, Andreas
We propose a novel quaternionic time-series compression methodology where we divide a long time-series into segments of data, extract the min, max, mean and standard deviation of these chunks as representative features and encapsulate them in a quaternion, yielding a quaternion valued time-series. This time-series is processed using quaternion valued neural network layers, where we aim to preserve the relation between these features through the usage of the Hamilton product. To train this quaternion neural network, we derive quaternion backpropagation employing the GHR calculus, which is required for a valid product and chain rule in quaternion space. Furthermore, we investigate the connection between the derived update rules and automatic differentiation. We apply our proposed compression method on the Tennessee Eastman Dataset, where we perform fault classification using the compressed data in two settings: a fully supervised one and in a semi supervised, contrastive learning setting. Both times, we were able to outperform real valued counterparts as well as two baseline models: one with the uncompressed time-series as the input and the other with a regular downsampling using the mean. Further, we could improve the classification benchmark set by SimCLR-TS from 81.43% to 83.90%.
Fast Second-Order Stochastic Backpropagation for Variational Inference Jeffrey Beck Duke University HKUST Katherine Heller HKUST
We propose a second-order (Hessian or Hessian-free) based optimization method for variational inference inspired by Gaussian backpropagation, and argue that quasi-Newton optimization can be developed as well. This is accomplished by generalizing the gradient computation in stochastic backpropagation via a reparametrization trick with lower complexity. As an illustrative example, we apply this approach to the problems of Bayesian logistic regression and variational auto-encoder (VAE). Additionally, we compute bounds on the estimator variance of intractable expectations for the family of Lipschitz continuous function. Our method is practical, scalable and model free. We demonstrate our method on several real-world datasets and provide comparisons with other stochastic gradient methods to show substantial enhancement in convergence rates.
Backpropagation for Energy-Efficient Neuromorphic Computing
Solving real world problems with embedded neural networks requires both training algorithms that achieve high performance and compatible hardware that runs in real time while remaining energy efficient. For the former, deep learning using backpropagation has recently achieved a string of successes across many domains and datasets. For the latter, neuromorphic chips that run spiking neural networks have recently achieved unprecedented energy efficiency. To bring these two advances together, we must first resolve the incompatibility between backpropagation, which uses continuous-output neurons and synaptic weights, and neuromorphic designs, which employ spiking neurons and discrete synapses. Our approach is to treat spikes and discrete synapses as continuous probabilities, which allows training the network using standard backpropagation. The trained network naturally maps to neuromorphic hardware by sampling the probabilities to create one or more networks, which are merged using ensemble averaging. To demonstrate, we trained a sparsely connected network that runs on the TrueNorth chip using the MNIST dataset. With a high performance network (ensemble of 64), we achieve 99.42% accuracy at 108 µJ per image, and with a high efficiency network (ensemble of 1) we achieve 92.7% accuracy at 0.268 µJ per image.
Backpropagation-Based Analytical Derivatives of EKF Covariance for Active Sensing
Benhamou, Jonas, Bonnabel, Silvère, Chapdelaine, Camille
In robotics, perception-aware (PA) approaches, [1, 2, 3, 4], or active sensing approaches, seek trajectories that maximize information gathered from sensors so as to perform robotic tasks safely. Notably, in the context of ground vehicles, when localization is based on ranging or bearing measurements relative to beacons, the efficiency of active sensing has been shown by [5, 6]. In [7], trajectories are generated to perform optimal online calibration between GPS and inertial measurement unit (IMU), see also [8]. In [3], for visual-inertial navigation systems, the authors have optimized the duration in which landmarks remain within the field of view. In the context of simultaneous localization and mapping (SLAM), those methods pertain to active SLAM, see [9].
Training Implicit Networks for Image Deblurring using Jacobian-Free Backpropagation
Liu, Linghai, Tong, Shuaicheng, Zhao, Lisa
Recent efforts in applying implicit networks to solve inverse problems in imaging have achieved competitive or even superior results when compared to feedforward networks. These implicit networks only require constant memory during backpropagation, regardless of the number of layers. However, they are not necessarily easy to train. Gradient calculations are computationally expensive because they require backpropagating through a fixed point. In particular, this process requires solving a large linear system whose size is determined by the number of features in the fixed point iteration. This paper explores a recently proposed method, Jacobian-free Backpropagation (JFB), a backpropagation scheme that circumvents such calculation, in the context of image deblurring problems. Our results show that JFB is comparable against fine-tuned optimization schemes, state-of-the-art (SOTA) feedforward networks, and existing implicit networks at a reduced computational cost.
Backpropagation Through Agents
Li, Zhiyuan, Zhao, Wenshuai, Wu, Lijun, Pajarinen, Joni
A fundamental challenge in multi-agent reinforcement learning (MARL) is to learn the joint policy in an extremely large search space, which grows exponentially with the number of agents. Moreover, fully decentralized policy factorization significantly restricts the search space, which may lead to sub-optimal policies. In contrast, the auto-regressive joint policy can represent a much richer class of joint policies by factorizing the joint policy into the product of a series of conditional individual policies. While such factorization introduces the action dependency among agents explicitly in sequential execution, it does not take full advantage of the dependency during learning. In particular, the subsequent agents do not give the preceding agents feedback about their decisions. In this paper, we propose a new framework Back-Propagation Through Agents (BPTA) that directly accounts for both agents' own policy updates and the learning of their dependent counterparts. This is achieved by propagating the feedback through action chains. With the proposed framework, our Bidirectional Proximal Policy Optimisation (BPPO) outperforms the state-of-the-art methods. Extensive experiments on matrix games, StarCraftII v2, Multi-agent MuJoCo, and Google Research Football demonstrate the effectiveness of the proposed method.
A Theoretical View of Linear Backpropagation and Its Convergence
Li, Ziang, Guo, Yiwen, Liu, Haodi, Zhang, Changshui
Backpropagation (BP) is widely used for calculating gradients in deep neural networks (DNNs). Applied often along with stochastic gradient descent (SGD) or its variants, BP is considered as a de-facto choice in a variety of machine learning tasks including DNN training and adversarial attack/defense. Recently, a linear variant of BP named LinBP was introduced for generating more transferable adversarial examples for performing black-box attacks, by Guo et al. Although it has been shown empirically effective in black-box attacks, theoretical studies and convergence analyses of such a method is lacking. This paper serves as a complement and somewhat an extension to Guo et al.'s paper, by providing theoretical analyses on LinBP in neural-network-involved learning tasks, including adversarial attack and model training. We demonstrate that, somewhat surprisingly, LinBP can lead to faster convergence in these tasks in the same hyper-parameter settings, compared to BP. We confirm our theoretical results with extensive experiments.
Digital twinning of cardiac electrophysiology models from the surface ECG: a geodesic backpropagation approach
Grandits, Thomas, Verhülsdonk, Jan, Haase, Gundolf, Effland, Alexander, Pezzuto, Simone
The eikonal equation has become an indispensable tool for modeling cardiac electrical activation accurately and efficiently. In principle, by matching clinically recorded and eikonal-based electrocardiograms (ECGs), it is possible to build patient-specific models of cardiac electrophysiology in a purely non-invasive manner. Nonetheless, the fitting procedure remains a challenging task. The present study introduces a novel method, Geodesic-BP, to solve the inverse eikonal problem. Geodesic-BP is well-suited for GPU-accelerated machine learning frameworks, allowing us to optimize the parameters of the eikonal equation to reproduce a given ECG. We show that Geodesic-BP can reconstruct a simulated cardiac activation with high accuracy in a synthetic test case, even in the presence of modeling inaccuracies. Furthermore, we apply our algorithm to a publicly available dataset of a biventricular rabbit model, with promising results. Given the future shift towards personalized medicine, Geodesic-BP has the potential to help in future functionalizations of cardiac models meeting clinical time constraints while maintaining the physiological accuracy of state-of-the-art cardiac models.