Goto

Collaborating Authors

 Backpropagation


Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder

arXiv.org Machine Learning

Inference and prediction under partial knowledge of a physical system is challenging, particularly when multiple confounding sources influence the measured response. Explicitly accounting for these influences in physics-based models is often infeasible due to epistemic uncertainty, cost, or time constraints, resulting in models that fail to accurately describe the behavior of the system. On the other hand, data-driven machine learning models such as variational autoencoders are not guaranteed to identify a parsimonious representation. As a result, they can suffer from poor generalization performance and reconstruction accuracy in the regime of limited and noisy data. We propose a physics-informed variational autoencoder architecture that combines the interpretability of physics-based models with the flexibility of data-driven models. To promote disentanglement of the known physics and confounding influences, the latent space is partitioned into physically meaningful variables that parametrize a physics-based model, and data-driven variables that capture variability in the domain and class of the physical system. The encoder is coupled with a decoder that integrates physics-based and data-driven components, and constrained by an adversarial training objective that prevents the data-driven components from overriding the known physics, ensuring that the physics-grounded latent variables remain interpretable. We demonstrate that the model is able to disentangle features of the input signal and separate the known physics from confounding influences using supervision in the form of class and domain observables. The model is evaluated on a series of synthetic case studies relevant to engineering structures, demonstrating the feasibility of the proposed approach.


Backpropagation Through Time For Networks With Long-Term Dependencies

arXiv.org Artificial Intelligence

Backpropagation through time (BPTT) is a technique of updating tuned parameters within recurrent neural networks (RNNs). Several attempts at creating such an algorithm have been made including: Nth Ordered Approximations and Truncated-BPTT. These methods approximate the backpropagation gradients under the assumption that the RNN only utilises short-term dependencies. This is an acceptable assumption to make for the current state of artificial neural networks. As RNNs become more advanced, a shift towards influence by long-term dependencies is likely. Thus, a new method for backpropagation is required. We propose using the 'discrete forward sensitivity equation' and a variant of it for single and multiple interacting recurrent loops respectively. This solution is exact and also allows the network's parameters to vary between each subsequent step, however it does require the computation of a Jacobian.


SUS backprop: linear backpropagation algorithm for long inputs in transformers

arXiv.org Artificial Intelligence

It is straightforward to design an unbiased gradient estimator that stochastically cuts the backpropagation flow through any part of a computational graph. By cutting the parts that have little effect on the computation, one can potentially save a significant amount of backpropagation computation in exchange for a minimal increase in the stochastic gradient variance, in some situations. Such a situation occurs in the attention mechanism of the transformer architecture. For long sequences, attention becomes the limiting factor, as its compute requirements increase quadratically with sequence length $n$. At the same time, most attention weights become very small, as most attention heads tend to connect a given token with only a small fraction of other tokens in the sequence. These weights become promising targets for cutting backpropagation. We propose a simple probabilistic rule controlled by a single parameter $c$ that cuts back-propagation through most attention weights, leaving at most $c$ interactions per token per attention head. This brings a factor of $c/n$ reduction in the compute required for the attention backpropagation, turning it from quadratic $O(n^2)$ to linear complexity $O(nc)$. We have empirically verified that, for a typical transformer model, cutting about $99\%$ of the attention gradient flow (i.e. choosing $c \sim 25-30$) results in relative gradient variance increase of only about $1\%$ for $n \sim 2000$, and it decreases with $n$. This approach is amenable to efficient sparse matrix implementation, thus being promising for making the cost of a backward pass negligible relative to the cost of a forward pass when training a transformer model on long sequences.


Exploring the Performance of Perforated Backpropagation through Further Experiments

arXiv.org Artificial Intelligence

Perforated Backpropagation is a neural network optimization technique based on modern understanding of the computational importance of dendrites within biological neurons. This paper explores further experiments from the original publication, generated from a hackathon held at the Carnegie Mellon Swartz C enter in February 2025. Students and local Pittsburgh ML practitioners were brought together to experiment with the Perforated Backpropagation algorithm on the datasets and models which they were using for their projects. Results showed that the system could enhance their projects, with up to 90% model compression without negative impact on accuracy, or up to 16% increased accuracy of their original models.


Dynamic Spectral Backpropagation for Efficient Neural Network Training

arXiv.org Artificial Intelligence

Dynamic Spectral Backpropagation (DSBP) enhances neural network training under resource constraints by projecting gradients onto principal eigenvectors, reducing complexity and promoting flat minima. Five extensions are proposed, dynamic spectral inference, spectral architecture optimization, spectral meta learning, spectral transfer regularization, and Lie algebra inspired dynamics, to address challenges in robustness, fewshot learning, and hardware efficiency. Supported by a third order stochastic differential equation (SDE) and a PAC Bayes limit, DSBP outperforms Sharpness Aware Minimization (SAM), Low Rank Adaptation (LoRA), and Model Agnostic Meta Learning (MAML) on CIFAR 10, Fashion MNIST, MedMNIST, and Tiny ImageNet, as demonstrated through extensive experiments and visualizations. Future work focuses on scalability, bias mitigation, and ethical considerations.


Advancing Training Efficiency of Deep Spiking Neural Networks through Rate-based Backpropagation

Neural Information Processing Systems

Recent insights have revealed that rate-coding is a primary form of information representation captured by surrogate-gradient-based Backpropagation Through Time (BPTT) in training deep Spiking Neural Networks (SNNs). Motivated by these findings, we propose rate-based backpropagation, a training strategy specifically designed to exploit rate-based representations to reduce the complexity of BPTT. Our method minimizes reliance on detailed temporal derivatives by focusing on averaged dynamics, streamlining the computational graph to reduce memory and computational demands of SNNs training. We substantiate the rationality of the gradient approximation between BPTT and the proposed method through both theoretical analysis and empirical observations. Comprehensive experiments on CIFAR-10, CIFAR-100, ImageNet, and CIFAR10-DVS validate that our method achieves comparable performance to BPTT counterparts, and surpasses state-of-the-art efficient training techniques.


BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models

Neural Information Processing Systems

Large Language Models (LLMs) often suffer from overconfidence during inference, particularly when adapted to downstream domain-specific tasks with limited data. Previous work addresses this issue by employing approximate Bayesian estimation after the LLMs are trained, enabling them to quantify uncertainty. However, such post-training approaches' performance is severely limited by the parameters learned during training. In this paper, we go beyond post-training Bayesianization and propose Bayesian Low-Rank Adaptation by Backpropagation (BLoB), an algorithm that continuously and jointly adjusts both the mean and covariance of LLM parameters throughout the whole fine-tuning process. Our empirical results verify the effectiveness of BLoB in terms of generalization and uncertainty estimation, when evaluated on both in-distribution and out-of-distribution data.


Convergence and Alignment of Gradient Descent with Random Backpropagation Weights

Neural Information Processing Systems

Stochastic gradient descent with backpropagation is the workhorse of artificial neural networks. It has long been recognized that backpropagation fails to be a biologically plausible algorithm. Fundamentally, it is a non-local procedure---updating one neuron's synaptic weights requires knowledge of synaptic weights or receptive fields of downstream neurons. This limits the use of artificial neural networks as a tool for understanding the biological principles of information processing in the brain. Lillicrap et al. (2016) propose a more biologically plausible "feedback alignment" algorithm that uses random and fixed backpropagation weights, and show promising simulations.


Backpropagation-Free Metropolis-Adjusted Langevin Algorithm

arXiv.org Artificial Intelligence

Recent work on backpropagation-free learning has shown that it is possible to use forward-mode automatic differentiation (AD) to perform optimization on differentiable models. Forward-mode AD requires sampling a tangent vector for each forward pass of a model. The result is the model evaluation with the directional derivative along the tangent. In this paper, we illustrate how the sampling of this tangent vector can be incorporated into the proposal mechanism for the Metropolis-Adjusted Langevin Algorithm (MALA). As such, we are the first to introduce a backpropagation-free gradient-based Markov chain Monte Carlo (MCMC) algorithm. We also extend to a novel backpropagation-free position-specific preconditioned forward-mode MALA that leverages Hessian information. Overall, we propose four new algorithms: Forward MALA; Line Forward MALA; Pre-conditioned Forward MALA, and Pre-conditioned Line Forward MALA. We highlight the reduced computational cost of the forward-mode samplers and show that forward-mode is competitive with the original MALA, while even outperforming it depending on the probabilistic model. We include Bayesian inference results on a range of probabilistic models, including hierarchical distributions and Bayesian neural networks.


You Only Look One Step: Accelerating Backpropagation in Diffusion Sampling with Gradient Shortcuts

arXiv.org Artificial Intelligence

Diffusion models (DMs) have recently demonstrated remarkable success in modeling large-scale data distributions. However, many downstream tasks require guiding the generated content based on specific differentiable metrics, typically necessitating backpropagation during the generation process. This approach is computationally expensive, as generating with DMs often demands tens to hundreds of recursive network calls, resulting in high memory usage and significant time consumption. In this paper, we propose a more efficient alternative that approaches the problem from the perspective of parallel denoising. We show that full backpropagation throughout the entire generation process is unnecessary. The downstream metrics can be optimized by retaining the computational graph of only one step during generation, thus providing a shortcut for gradient propagation. The resulting method, which we call Shortcut Diffusion Optimization (SDO), is generic, high-performance, and computationally lightweight, capable of optimizing all parameter types in diffusion sampling. We demonstrate the effectiveness of SDO on several real-world tasks, including controlling generation by optimizing latent and aligning the DMs by fine-tuning network parameters. Compared to full backpropagation, our approach reduces computational costs by $\sim 90\%$ while maintaining superior performance. Code is available at https://github.com/deng-ai-lab/SDO.