Goto

Collaborating Authors

 Backpropagation


Model-Based Machine Learning for Joint Digital Backpropagation and PMD Compensation

arXiv.org Machine Learning

In this paper, we propose a model-based machine-learning approach for dual-polarization systems by parameterizing the split-step Fourier method for the Manakov-PMD equation. The resulting method combines hardware-friendly time-domain nonlinearity mitigation via the recently proposed learned digital backpropagation (LDBP) with distributed compensation of polarization-mode dispersion (PMD). We refer to the resulting approach as LDBP-PMD. We train LDBP-PMD on multiple PMD realizations and show that it converges within 1% of its peak dB performance after 428 training iterations on average, yielding a peak effective signal-to-noise ratio of only 0.30 dB below the PMD-free case. Similar to state-of-the-art lumped PMD compensation algorithms in practical systems, our approach does not assume any knowledge about the particular PMD realization along the link, nor any knowledge about the total accumulated PMD. This is a significant improvement compared to prior work on distributed PMD compensation, where knowledge about the accumulated PMD is typically assumed. We also compare different parameterization choices in terms of performance, complexity, and convergence behavior. Lastly, we demonstrate that the learned models can be successfully retrained after an abrupt change of the PMD realization along the fiber.


An Alternative to Backpropagation in Deep Reinforcement Learning

arXiv.org Artificial Intelligence

State-of-the-art deep learning algorithms mostly rely on gradient backpropagation to train a deep artificial neural network, which is generally regarded to be biologically implausible. For a network of stochastic units trained on a reinforcement learning task or a supervised learning task, one biologically plausible way of learning is to train each unit by REINFORCE. In this case, only a global reward signal has to be broadcast to all units, and the learning rule given is local, which can be interpreted as reward-modulated spike-timing-dependent plasticity (R-STDP) that is observed biologically. Although this learning rule follows the gradient of return in expectation, it suffers from high variance and cannot be used to train a deep network in practice. In this paper, we propose an algorithm called MAP propagation that can reduce this variance significantly while retaining the local property of learning rule. Different from prior works on local learning rules (e.g. Contrastive Divergence) which mostly applies to undirected models in unsupervised learning tasks, our proposed algorithm applies to directed models in reinforcement learning tasks. We show that the newly proposed algorithm can solve common reinforcement learning tasks at a speed similar to that of backpropagation when applied to an actor-critic network.


Interlocking Backpropagation: Improving depthwise model-parallelism

arXiv.org Artificial Intelligence

The number of parameters in state of the art neural networks has drastically increased in recent years. This surge of interest in large scale neural networks has motivated the development of new distributed training strategies enabling such models. One such strategy is model-parallel distributed training. Unfortunately, model-parallelism suffers from poor resource utilisation, which leads to wasted resources. In this work, we improve upon recent developments in an idealised model-parallel optimisation setting: local learning. Motivated by poor resource utilisation, we introduce a class of intermediary strategies between local and global learning referred to as interlocking backpropagation. These strategies preserve many of the compute-efficiency advantages of local optimisation, while recovering much of the task performance achieved by global optimisation. We assess our strategies on both image classification ResNets and Transformer language models, finding that our strategy consistently out-performs local learning in terms of task performance, and out-performs global learning in training efficiency.


QR and LQ Decomposition Matrix Backpropagation Algorithms for Square, Wide, and Deep Matrices and Their Software Implementation

arXiv.org Machine Learning

This article presents matrix backpropagation algorithms for the QR decomposition of matrices $A_{m, n}$, that are either square (m = n), wide (m < n), or deep (m > n), with rank $k = min(m, n)$. Furthermore, we derive novel matrix backpropagation results for the pivoted (full-rank) QR decomposition and for the LQ decomposition of deep input matrices. Differentiable QR decomposition offers a numerically stable, computationally efficient method to solve least squares problems frequently encountered in machine learning and computer vision. Software implementation across popular deep learning frameworks (PyTorch, TensorFlow, MXNet) incorporate the methods for general use within the deep learning community. Furthermore, this article aids the practitioner in understanding the matrix backpropagation methodology as part of larger computational graphs, and hopefully, leads to new lines of research.


Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain

arXiv.org Artificial Intelligence

Can the powerful backpropagation of error (backprop) reinforcement learning algorithm be formulated in a manner suitable for implementation in neural circuitry? The primary challenge is to ensure that any candidate formulation uses only local information, rather than relying on global (error) signals, as in orthodox backprop. Recently several algorithms for approximating backprop using only local signals, such as predictive coding and equilibrium-prop, have been proposed. However, these algorithms typically impose other requirements which challenge biological plausibility: for example, requiring complex and precise connectivity schemes (predictive coding), or multiple sequential backwards phases with information being stored across phases (equilibrium-prop). Here, we propose a novel local algorithm, Activation Relaxation (AR), which is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges robustly and exactly to the correct backpropagation gradients, requires only a single type of neuron, utilises only a single backwards phase, and can perform credit assignment on arbitrary computation graphs. We illustrate these properties by training deep neural networks on visual classification tasks, and we describe simplifications to the algorithm which remove further obstacles to neurobiological implementation (for example, the weight-transport problem, and the use of nonlinear derivatives), while preserving performance.


All the Backpropagation derivatives

#artificialintelligence

This is easy to solve as we already computed'dz' and the second term is simply the derivative of'z' which is'wX b' w.r.t'b' which is simply 1! For completeness we will also show how to calculate'db' directly. To calculate this we will take a step from the above calculation for'dw', (from just before we did the differentiation) Taking the LHS first, the derivative of'wX' w.r.t'b' is zero as it doesn't contain b! The derivative of'b' is simply 1, so we are just left with the'y' outside the parenthesis.


RIFLE: Backpropagation in Depth for Deep Transfer Learning through Re-Initializing the Fully-connected LayEr

arXiv.org Machine Learning

Fine-tuning the deep convolution neural network(CNN) using a pre-trained model helps transfer knowledge learned from larger datasets to the target task. While the accuracy could be largely improved even when the training dataset is small, the transfer learning outcome is usually constrained by the pre-trained model with close CNN weights (Liu et al., 2019), as the backpropagation here brings smaller updates to deeper CNN layers. In this work, we propose RIFLE - a simple yet effective strategy that deepens backpropagation in transfer learning settings, through periodically Re-Initializing the Fully-connected LayEr with random scratch during the fine-tuning procedure. RIFLE brings meaningful updates to the weights of deep CNN layers and improves low-level feature learning, while the effects of randomization can be easily converged throughout the overall learning procedure. The experiments show that the use of RIFLE significantly improves deep transfer learning accuracy on a wide range of datasets, out-performing known tricks for the similar purpose, such as Dropout, DropConnect, StochasticDepth, Disturb Label and Cyclic Learning Rate, under the same settings with 0.5% -2% higher testing accuracy. Empirical cases and ablation studies further indicate RIFLE brings meaningful updates to deep CNN layers with accuracy improved.


AAD & Backpropagation in Machine Learning and Finance, Explained in 15min -- Antoine Savine

#artificialintelligence

DataOps: How Bell Canada Powers their Business with Data - July 15 Demand for data outstrips the capacity of IT organizations and data engineering teams to deliver. The enabling technologies exist today and data management practices are moving quickly toward a future of DataOps. DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics. DataOps: How Bell Canada Powers their Business with Data - July 15 Demand for data outstrips the capacity of IT organizations and data engineering teams to deliver. The enabling technologies exist today and data management practices are moving quickly toward a future of DataOps.


Building A Mental Model for Backpropagation

#artificialintelligence

As the beating heart of deep learning, a solid understanding of backpropagation is required for any deep learning practitioner. Although there are a lot of good resources that explain backpropagation on the internet already, most of them explain from very different angles and each is good for a certain type of audience. In this post, I'm going to combine intuition, animated graphs and code together for beginners and intermediate level students of deep learning for easier consumption. A good assessment of the understanding of any algorithm is whether you can code it out yourself from scratch. After reading this post, you should have an idea of how to implement your own version of backpropagation in Python. Mathematically, backpropagation is the process of computing gradients for the components of a function by applying the chain rule.


Enabling Deep Spiking Neural Networks with Hybrid Conversion and Spike Timing Dependent Backpropagation

arXiv.org Machine Learning

Spiking Neural Networks (SNNs) operate with asynchronous discrete events (or spikes) which can potentially lead to higher energy-efficiency in neuromorphic hardware implementations. Many works have shown that an SNN for inference can be formed by copying the weights from a trained Artificial Neural Network (ANN) and setting the firing threshold for each layer as the maximum input received in that layer. These type of converted SNNs require a large number of time steps to achieve competitive accuracy which diminishes the energy savings. The number of time steps can be reduced by training SNNs with spike-based backpropagation from scratch, but that is computationally expensive and slow. To address these challenges, we present a computationally-efficient training technique for deep SNNs. We propose a hybrid training methodology: 1) take a converted SNN and use its weights and thresholds as an initialization step for spike-based backpropagation, and 2) perform incremental spike-timing dependent backpropagation (STDB) on this carefully initialized network to obtain an SNN that converges within few epochs and requires fewer time steps for input processing. STDB is performed with a novel surrogate gradient function defined using neuron's spike time. The proposed training methodology converges in less than 20 epochs of spike-based backpropagation for most standard image classification datasets, thereby greatly reducing the training complexity compared to training SNNs from scratch. We perform experiments on CIFAR-10, CIFAR-100, and ImageNet datasets for both VGG and ResNet architectures. We achieve top-1 accuracy of 65.19% for ImageNet dataset on SNN with 250 time steps, which is 10X faster compared to converted SNNs with similar accuracy.