Goto

Collaborating Authors

 Backpropagation


ABPT: Amended Backpropagation through Time with Partially Differentiable Rewards

arXiv.org Artificial Intelligence

Using the exact gradients of the rewards to directly optimize policy parameters via backpropagation-through-time (BPTT) enables high training performance for quadrotor tasks. However, designing a fully differentiable reward architecture is often challenging. Partially differentiable rewards will result in biased gradient propagation that degrades training performance. To overcome this limitation, we propose Amended Backpropagation-through-Time (ABPT), a novel approach that mitigates gradient bias while preserving the training efficiency of BPTT. ABPT combines 0-step and N-step returns, effectively reducing the bias by leveraging value gradients from the learned Q-value function. Additionally, it adopts entropy regularization and state initialization mechanisms to encourage exploration during training. We evaluate ABPT on four representative quadrotor flight tasks. Experimental results demonstrate that ABPT converges significantly faster and achieves higher ultimate rewards than existing learning algorithms, particularly in tasks involving partially differentiable rewards.


Review for NeurIPS paper: Attention-Gated Brain Propagation: How the brain can implement reward-based error backpropagation

Neural Information Processing Systems

Additional Feedback: - To me, the fact that learning was not much slower than standard supervised learning seems like the most important result of the paper, and I would have liked to see more analysis of how this works (rather than just a report of the empirical result). Additionally it would be nice to see a more systematic exploration of how this scales with the number of classes, including greater numbers of classes. This is an important and strong statement about physiology, but I'm not sure the references support it. Many references are given, but this isn't the main topic of any of them. I looked fairly carefully for support for this statement in the first reference and didn't find it.


Review for NeurIPS paper: Attention-Gated Brain Propagation: How the brain can implement reward-based error backpropagation

Neural Information Processing Systems

The reviewers agreed that this paper provides an important contribution to the biological learning literature, and agreed that it should be accepted. However, the reviewers were also in agreement that the authors must do the following for the camera-ready version of the paper: 1) Provide greater clarity that this is an extension of AGREL and does not involve any changes to the core AGREL algorithm, but rather, a means of gating the attention signals sent back through multiple layers.


Reviews: Memory-Efficient Backpropagation Through Time

Neural Information Processing Systems

The authors are solving an important problem. RNN training procedures can be greedy for memory. And, given the sequential nature, it's not trivial to simply to scale the training of each sequence over many machines. As a result, it's important to judiciously use memory and computational resources to train RNNs efficiently. I'm pleased to see the authors not only proposing a new instance of a solution, but to provide a user-selectable tradeoff between the quantity of computation and the memory usage.


Reviews: Learning Multiagent Communication with Backpropagation

Neural Information Processing Systems

The model is a deep network which consists of a stack of layers, with parameter sharing between modules of a same layer. This parameter sharing allows the number of agents to vary during the task. Also, it allows to drastically reduce the number of parameters to be learned. The key idea of the paper is to use the output of every module of a given layer to build the communication input for the next layer. While this appears to obtain interesting results in the reported experiments, I find this proposal very straightforward and poorly innovative, as it corresponds to a quiet classical neural network structure.


On quantum backpropagation, information reuse, and cheating measurement collapse

Neural Information Processing Systems

The success of modern deep learning hinges on the ability to train neural networks at scale. Through clever reuse of intermediate information, backpropagation facilitates training through gradient computation at a total cost roughly proportional to running the function, rather than incurring an additional factor proportional to the number of parameters -- which can now be in the trillions. Naively, one expects that quantum measurement collapse entirely rules out the reuse of quantum information as in backpropagation. But recent developments in shadow tomography, which assumes access to multiple copies of a quantum state, have challenged that notion. Here, we investigate whether parameterized quantum models can train as efficiently as classical neural networks. We show that achieving backpropagation scaling is impossible without access to multiple copies of a state.


CBP: backpropagation with constraint on weight precision using a pseudo-Lagrange multiplier method

Neural Information Processing Systems

Backward propagation of errors (backpropagation) is a method to minimize objective functions (e.g., loss functions) of deep neural networks by identifying optimal sets of weights and biases. Imposing constraints on weight precision is often required to alleviate prohibitive workloads on hardware. Despite the remarkable success of backpropagation, the algorithm itself is not capable of considering such constraints unless additional algorithms are applied simultaneously. To address this issue, we propose the constrained backpropagation (CBP) algorithm based on the pseudo-Lagrange multiplier method to obtain the optimal set of weights that satisfy a given set of constraints. The defining characteristic of the proposed CBP algorithm is the utilization of a Lagrangian function (loss function plus constraint function) as its objective function.


Real-Valued Backpropagation is Unsuitable for Complex-Valued Neural Networks

Neural Information Processing Systems

Recently complex-valued neural networks have received increasing attention due to successful applications in various tasks and the potential advantages of better theoretical properties and richer representational capacity. However, the training dynamics of complex networks compared to real networks remains an open problem. In this paper, we investigate the dynamics of deep complex networks during real-valued backpropagation in the infinite-width limit via neural tangent kernel (NTK). We first extend the Tensor Program to the complex domain, to show that the dynamics of any basic complex network architecture is governed by its NTK under real-valued backpropagation. Then we propose a way to investigate the comparison of training dynamics between complex and real networks by studying their NTKs. As a result, we surprisingly prove that for most complex activation functions, the commonly used real-valued backpropagation reduces the training dynamics of complex networks to that of ordinary real networks as the widths tend to infinity, thus eliminating the characteristics of complex-valued neural networks.


Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation

Neural Information Processing Systems

Transfer learning from the model trained on large datasets to customized downstream tasks has been widely used as the pre-trained model can greatly boost the generalizability. However, the increasing sizes of pre-trained models also lead to a prohibitively large memory footprints for downstream transferring, making them unaffordable for personal devices. Previous work recognizes the bottleneck of the footprint to be the activation, and hence proposes various solutions such as injecting specific lite modules. In this work, we present a novel memory-efficient transfer framework called Back Razor, that can be plug-and-play applied to any pre-trained network without changing its architecture. The key idea of Back Razor is asymmetric sparsifying: pruning the activation stored for back-propagation, while keeping the forward activation dense.


Convergence and Alignment of Gradient Descent with Random Backpropagation Weights

Neural Information Processing Systems

Stochastic gradient descent with backpropagation is the workhorse of artificial neural networks. It has long been recognized that backpropagation fails to be a biologically plausible algorithm. Fundamentally, it is a non-local procedure---updating one neuron's synaptic weights requires knowledge of synaptic weights or receptive fields of downstream neurons. This limits the use of artificial neural networks as a tool for understanding the biological principles of information processing in the brain. Lillicrap et al. (2016) propose a more biologically plausible "feedback alignment" algorithm that uses random and fixed backpropagation weights, and show promising simulations.