Goto

Collaborating Authors

 Backpropagation


Structural Credit Assignment with Coordinated Exploration

arXiv.org Artificial Intelligence

A biologically plausible method for training an Artificial Neural Network (ANN) involves treating each unit as a stochastic Reinforcement Learning (RL) agent, thereby considering the network as a team of agents. Consequently, all units can learn via REINFORCE, a local learning rule modulated by a global reward signal, which aligns more closely with biologically observed forms of synaptic plasticity. However, this learning method tends to be slow and does not scale well with the size of the network. This inefficiency arises from two factors impeding effective structural credit assignment: (i) all units independently explore the network, and (ii) a single reward is used to evaluate the actions of all units. Accordingly, methods aimed at improving structural credit assignment can generally be classified into two categories. The first category includes algorithms that enable coordinated exploration among units, such as MAP propagation. The second category encompasses algorithms that compute a more specific reward signal for each unit within the network, like Weight Maximization and its variants. In this research report, our focus is on the first category. We propose the use of Boltzmann machines or a recurrent network for coordinated exploration. We show that the negative phase, which is typically necessary to train Boltzmann machines, can be removed. The resulting learning rules are similar to the reward-modulated Hebbian learning rule. Experimental results demonstrate that coordinated exploration significantly exceeds independent exploration in training speed for multiple stochastic and discrete units based on REINFORCE, even surpassing straight-through estimator (STE) backpropagation.


Concept backpropagation: An Explainable AI approach for visualising learned concepts in neural network models

arXiv.org Artificial Intelligence

Neural network models are widely used in a variety of domains, often as black-box solutions, since they are not directly interpretable for humans. The field of explainable artificial intelligence aims at developing explanation methods to address this challenge, and several approaches have been developed over the recent years, including methods for investigating what type of knowledge these models internalise during the training process. Among these, the method of concept detection, investigates which \emph{concepts} neural network models learn to represent in order to complete their tasks. In this work, we present an extension to the method of concept detection, named \emph{concept backpropagation}, which provides a way of analysing how the information representing a given concept is internalised in a given neural network model. In this approach, the model input is perturbed in a manner guided by a trained concept probe for the described model, such that the concept of interest is maximised. This allows for the visualisation of the detected concept directly in the input space of the model, which in turn makes it possible to see what information the model depends on for representing the described concept. We present results for this method applied to a various set of input modalities, and discuss how our proposed method can be used to visualise what information trained concept probes use, and the degree as to which the representation of the probed concept is entangled within the neural network model itself.


DNArch: Learning Convolutional Neural Architectures by Backpropagation

arXiv.org Artificial Intelligence

We present Differentiable Neural Architectures (DNArch), a method that jointly learns the weights and the architecture of Convolutional Neural Networks (CNNs) by backpropagation. In particular, DNArch allows learning (i) the size of convolutional kernels at each layer, (ii) the number of channels at each layer, (iii) the position and values of downsampling layers, and (iv) the depth of the network. To this end, DNArch views neural architectures as continuous multidimensional entities, and uses learnable differentiable masks along each dimension to control their size. Unlike existing methods, DNArch is not limited to a predefined set of possible neural components, but instead it is able to discover entire CNN architectures across all feasible combinations of kernel sizes, widths, depths and downsampling. Empirically, DNArch finds performant CNN architectures for several classification and dense prediction tasks on sequential and image data. When combined with a loss term that controls the network complexity, DNArch constrains its search to architectures that respect a predefined computational budget during training.


AdjointDPM: Adjoint Sensitivity Method for Gradient Backpropagation of Diffusion Probabilistic Models

arXiv.org Artificial Intelligence

Existing customization methods require access to multiple reference examples to align pre-trained diffusion probabilistic models (DPMs) with user-provided concepts. This paper aims to address the challenge of DPM customization when the only available supervision is a differentiable metric defined on the generated contents. Since the sampling procedure of DPMs involves recursive calls to the denoising UNet, na\"ive gradient backpropagation requires storing the intermediate states of all iterations, resulting in extremely high memory consumption. To overcome this issue, we propose a novel method AdjointDPM, which first generates new samples from diffusion models by solving the corresponding probability-flow ODEs. It then uses the adjoint sensitivity method to backpropagate the gradients of the loss to the models' parameters (including conditioning signals, network weights, and initial noises) by solving another augmented ODE. To reduce numerical errors in both the forward generation and gradient backpropagation processes, we further reparameterize the probability-flow ODE and augmented ODE as simple non-stiff ODEs using exponential integration. Finally, we demonstrate the effectiveness of AdjointDPM on three interesting tasks: converting visual effects into identification text embeddings, finetuning DPMs for specific types of stylization, and optimizing initial noise to generate adversarial samples for security auditing.


PaReprop: Fast Parallelized Reversible Backpropagation

arXiv.org Artificial Intelligence

The growing size of datasets and deep learning models has made faster and memory-efficient training crucial. Reversible transformers have recently been introduced as an exciting new method for extremely memory-efficient training, but they come with an additional computation overhead of activation re-computation in the backpropagation phase. We present PaReprop, a fast Parallelized Reversible Backpropagation algorithm that parallelizes the additional activation re-computation overhead in reversible training with the gradient computation itself in backpropagation phase. We demonstrate the effectiveness of the proposed PaReprop algorithm through extensive benchmarking across model families (ViT, MViT, Swin and RoBERTa), data modalities (Vision & NLP), model sizes (from small to giant), and training batch sizes. Our empirical results show that PaReprop achieves up to 20% higher training throughput than vanilla reversible training, largely mitigating the theoretical overhead of 25% lower throughput from activation recomputation in reversible training. Project page: https://tylerzhu.com/pareprop.


Can Forward Gradient Match Backpropagation?

arXiv.org Artificial Intelligence

Forward Gradients - the idea of using directional derivatives in forward differentiation mode - have recently been shown to be utilizable for neural network training while avoiding problems generally associated with backpropagation gradient computation, such as locking and memorization requirements. The cost is the requirement to guess the step direction, which is hard in high dimensions. While current solutions rely on weighted averages over isotropic guess vector distributions, we propose to strongly bias our gradient guesses in directions that are much more promising, such as feedback obtained from small, local auxiliary networks. For a standard computer vision neural network, we conduct a rigorous study systematically covering a variety of combinations of gradient targets and gradient guesses, including those previously presented in the literature. We find that using gradients obtained from a local loss as a candidate direction drastically improves on random noise in Forward Gradient methods.


Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability

arXiv.org Artificial Intelligence

Is the lottery ticket phenomenon an idiosyncrasy of gradient-based training or does it generalize to evolutionary optimization? In this paper we establish the existence of highly sparse trainable initializations for evolution strategies (ES) and characterize qualitative differences compared to gradient descent (GD)-based sparse training. We introduce a novel signal-to-noise iterative pruning procedure, which incorporates loss curvature information into the network pruning step. This can enable the discovery of even sparser trainable network initializations when using black-box evolution as compared to GD-based optimization. Furthermore, we find that these initializations encode an inductive bias, which transfers across different ES, related tasks and even to GD-based training. Finally, we compare the local optima resulting from the different optimization paradigms and sparsity levels. In contrast to GD, ES explore diverse and flat local optima and do not preserve linear mode connectivity across sparsity levels and independent runs. The results highlight qualitative differences between evolution and gradient-based learning dynamics, which can be uncovered by the study of iterative pruning procedures.


Does Federated Learning Really Need Backpropagation?

arXiv.org Artificial Intelligence

Federated learning (FL) is a general principle for decentralized clients to train a server model collectively without sharing local data. FL is a promising framework with practical applications, but its standard training paradigm requires the clients to backpropagate through the model to compute gradients. Since these clients are typically edge devices and not fully trusted, executing backpropagation on them incurs computational and storage overhead as well as white-box vulnerability. In light of this, we develop backpropagation-free federated learning, dubbed BAFFLE, in which backpropagation is replaced by multiple forward processes to estimate gradients. BAFFLE is 1) memory-efficient and easily fits uploading bandwidth; 2) compatible with inference-only hardware optimization and model quantization or pruning; and 3) well-suited to trusted execution environments, because the clients in BAFFLE only execute forward propagation and return a set of scalars to the server. Empirically we use BAFFLE to train deep models from scratch or to finetune pretrained models, achieving acceptable results. Code is available in https://github.com/FengHZ/BAFFLE.


Ensemble Learning Model on Artificial Neural Network-Backpropagation (ANN-BP) Architecture for Coal Pillar Stability Classification

arXiv.org Artificial Intelligence

Pillars are important structural units used to ensure mining safety in underground hard rock mines. Therefore, precise predictions regarding the stability of underground pillars are required. One common index that is often used to assess pillar stability is the Safety Factor (SF). Unfortunately, such crisp boundaries in pillar stability assessment using SF are unreliable. This paper presents a novel application of Artificial Neural Network-Backpropagation (ANN-BP) and Deep Ensemble Learning for pillar stability classification. There are three types of ANN-BP used for the classification of pillar stability distinguished by their activation functions: ANN-BP ReLU, ANN-BP ELU, and ANN-BP GELU. This research also presents a new labeling alternative for pillar stability by considering its suitability with the SF. Thus, pillar stability is expanded into four categories: failed with a suitable safety factor, intact with a suitable safety factor, failed without a suitable safety factor, and intact without a suitable safety factor. There are five inputs used for each model: pillar width, mining height, bord width, depth to floor, and ratio. The results showed that the ANN-BP model with Ensemble Learning could improve ANN-BP performance with an average accuracy of 86.48% and an F_2-score of 96.35% for the category of failed with a suitable safety factor.


The Integrated Forward-Forward Algorithm: Integrating Forward-Forward and Shallow Backpropagation With Local Losses

arXiv.org Artificial Intelligence

Peking University Abstract: The backpropagation algorithm, despite its widespread use in neural network learning, may not accurately emulate the human cortex's learning process. However, the original FFA paper and related works on the Forward-Forward Algorithm only mentioned very limited types of neural network mechanisms and may limit its application and effectiveness. In response to these challenges, we propose an integrated method that combines the strengths of both FFA and shallow backpropagation, yielding a biologically plausible neural network training algorithm which can also be applied to various network structures. We applied this integrated approach to the classification of the Modified National Institute of Standards and Technology (MNIST) database, where it outperformed FFA and demonstrated superior resilience to noise compared to backpropagation. We show that training neural networks with the Integrated Forward-Forward Algorithm has the potential of generating neural networks with advantageous features like robustness. 1. Introduction For the past decade, Deep learning has made significant progress on countless tasks and problems, with backpropagation as a key contributing factor to train the deep models developed. However, in aspects of biological plausibility, although the model of deep neural networks for artificial intelligence is initially inspired by biological neurons' structures, the training process of these deep neural networks using backpropagation throughout the whole neural network does not seem biologically plausible. Researchers have argued that the long range, or global, backpropagation of gradient information has little evidence in neuron science and violates many principles of the cortex learning found in neural science, like locality and onlinety. The original Forward-Forward Algorithm is one of the approaches in biological plausible learning as an alternative of backpropagation to train neural networks initially proposed by Hinton(2022) [1].