Rajendran, Bipin
Closed-Form Feedback-Free Learning with Forward Projection
O'Shea, Robert, Rajendran, Bipin
State-of-the-art methods for backpropagation-free learning employ local error feedback to direct iterative optimisation via gradient descent. In this study, we examine the more restrictive setting where retrograde communication from neuronal outputs is unavailable for pre-synaptic weight optimisation. To address this challenge, we propose Forward Projection (FP). This novel randomised closed-form training method requires only a single forward pass over the entire dataset for model fitting, without retrograde communication. Target values for pre-activation membrane potentials are generated layer-wise via nonlinear projections of pre-synaptic inputs and the labels. Local loss functions are optimised over pre-synaptic inputs using closed-form regression, without feedback from neuronal outputs or downstream layers. Interpretability is a key advantage of FP training; membrane potentials of hidden neurons in FP-trained networks encode information which is interpretable layer-wise as label predictions. We demonstrate the effectiveness of FP across four biomedical datasets. In few-shot learning tasks, FP yielded more generalisable models than those optimised via backpropagation. In large-sample tasks, FP-based models achieve generalisation comparable to gradient descent-based local learning methods while requiring only a single forward propagation step, achieving significant speed up for training. Interpretation functions defined on local neuronal activity in FP-based models successfully identified clinically salient features for diagnosis in two biomedical datasets. Forward Projection is a computationally efficient machine learning approach that yields interpretable neural network models without retrograde communication of neuronal activity during training.
Efficient Deployment of Transformer Models in Analog In-Memory Computing Hardware
Li, Chen, Lammie, Corey, Gallo, Manuel Le, Rajendran, Bipin
Analog in-memory computing (AIMC) has emerged as a promising solution to overcome the von Neumann bottleneck, accelerating neural network computations and improving computational efficiency. While AIMC has demonstrated success with architectures such as CNNs, MLPs, and RNNs, deploying transformer-based models using AIMC presents unique challenges. Transformers are expected to handle diverse downstream tasks and adapt to new user data or instructions after deployment, which requires more flexible approaches to suit AIMC constraints. In this paper, we propose a novel method for deploying pre-trained transformer models onto AIMC hardware. Unlike traditional approaches requiring hardware-aware training, our technique allows direct deployment without the need for retraining the original model. Instead, we utilize lightweight, low-rank adapters -- compact modules stored in digital cores -- to adapt the model to hardware constraints. We validate our approach on MobileBERT, demonstrating accuracy on par with, or even exceeding, a traditional hardware-aware training approach. Our method is particularly appealing in multi-task scenarios, as it enables a single analog model to be reused across multiple tasks. Moreover, it supports on-chip adaptation to new hardware constraints and tasks without updating analog weights, providing a flexible and versatile solution for real-world AI applications. Code is available.
Neuromorphic Wireless Split Computing with Multi-Level Spikes
Wu, Dengyu, Chen, Jiechen, Rajendran, Bipin, Poor, H. Vincent, Simeone, Osvaldo
Inspired by biological processes, neuromorphic computing utilizes spiking neural networks (SNNs) to perform inference tasks, offering significant efficiency gains for workloads involving sequential data. Recent advances in hardware and software have demonstrated that embedding a few bits of payload in each spike exchanged between the spiking neurons can further enhance inference accuracy. In a split computing architecture, where the SNN is divided across two separate devices, the device storing the first layers must share information about the spikes generated by the local output neurons with the other device. Consequently, the advantages of multi-level spikes must be balanced against the challenges of transmitting additional bits between the two devices. For this system, we present the design of digital and analog modulation schemes optimized for an orthogonal frequency division multiplexing (OFDM) radio interface. Simulation and experimental results using software-defined radios provide insights into the performance gains of multi-level SNN models and the optimal payload size as a function of the quality of the connection between a transmitter and receiver. D. Wu and B. Rajendran are with the King's Laboratory for Intelligent Computing (KLIC) lab within the Centre for Intelligent Information Processing Systems (CIIPS) at the Department of Engineering, King's College London, London, WC2R 2LS, UK (email:{dengyu.wu, J. Chen and O. Simeone are with the King's Communications, Learning and Information Processing (KCLIP) lab within the CIIPS at the Department of Engineering, King's College London, London, WC2R 2LS, UK (email:{jiechen.chen,
Baseline Drift Tolerant Signal Encoding for ECG Classification with Deep Learning
Shea, Robert O, Katti, Prabodh, Rajendran, Bipin
Common artefacts such as baseline drift, rescaling, and noise critically limit the performance of machine learningbased automated ECG analysis and interpretation. This study proposes Derived Peak (DP) encoding, a non-parametric method that generates signed spikes corresponding to zero crossings of the signals first and second-order time derivatives. Notably, DP encoding is invariant to shift and scaling artefacts, and its implementation is further simplified by the absence of userdefined parameters. DP encoding was used to encode the 12-lead ECG data from the PTB-XL dataset (n=18,869 participants) and was fed to 1D-ResNet-18 models trained to identify myocardial infarction, conductive deficits and ST-segment abnormalities. Robustness to artefacts was assessed by corrupting ECG data with sinusoidal baseline drift, shift, rescaling and noise, before encoding. The addition of these artefacts resulted in a significant drop in accuracy for seven other methods from prior art, while DP encoding maintained a baseline AUC of 0.88 under drift, shift and rescaling. DP achieved superior performance to unencoded inputs in the presence of shift (AUC under 1mV shift: 0.91 vs 0.62), and rescaling artefacts (AUC 0.91 vs 0.79). Thus, DP encoding is a simple method by which robustness to common ECG artefacts may be improved for automated ECG analysis and interpretation.
Stochastic Spiking Attention: Accelerating Attention with Stochastic Computing in Spiking Networks
Song, Zihang, Katti, Prabodh, Simeone, Osvaldo, Rajendran, Bipin
Spiking Neural Networks (SNNs) have been recently integrated into Transformer architectures due to their potential to reduce computational demands and to improve power efficiency. Yet, the implementation of the attention mechanism using spiking signals on general-purpose computing platforms remains inefficient. In this paper, we propose a novel framework leveraging stochastic computing (SC) to effectively execute the dot-product attention for SNN-based Transformers. We demonstrate that our approach can achieve high classification accuracy ($83.53\%$) on CIFAR-10 within 10 time steps, which is comparable to the performance of a baseline artificial neural network implementation ($83.66\%$). We estimate that the proposed SC approach can lead to over $6.3\times$ reduction in computing energy and $1.7\times$ reduction in memory access costs for a digital CMOS-based ASIC design. We experimentally validate our stochastic attention block design through an FPGA implementation, which is shown to achieve $48\times$ lower latency as compared to a GPU implementation, while consuming $15\times$ less power.
Towards Efficient and Trustworthy AI Through Hardware-Algorithm-Communication Co-Design
Rajendran, Bipin, Simeone, Osvaldo, Al-Hashimi, Bashir M.
Artificial intelligence (AI) algorithms based on neural networks have been designed for decades with the goal of maximising some measure of accuracy. This has led to two undesired effects. First, model complexity has risen exponentially when measured in terms of computation and memory requirements. Second, state-of-the-art AI models are largely incapable of providing trustworthy measures of their uncertainty, possibly `hallucinating' their answers and discouraging their adoption for decision-making in sensitive applications. With the goal of realising efficient and trustworthy AI, in this paper we highlight research directions at the intersection of hardware and software design that integrate physical insights into computational substrates, neuroscientific principles concerning efficient information processing, information-theoretic results on optimal uncertainty quantification, and communication-theoretic guidelines for distributed processing. Overall, the paper advocates for novel design methodologies that target not only accuracy but also uncertainty quantification, while leveraging emerging computing hardware architectures that move beyond the traditional von Neumann digital computing paradigm to embrace in-memory, neuromorphic, and quantum computing technologies. An important overarching principle of the proposed approach is to view the stochasticity inherent in the computational substrate and in the communication channels between processors as a resource to be leveraged for the purpose of representing and processing classical and quantum uncertainty.
Energy-Efficient On-Board Radio Resource Management for Satellite Communications via Neuromorphic Computing
Ortiz, Flor, Skatchkovsky, Nicolas, Lagunas, Eva, Martins, Wallace A., Eappen, Geoffrey, Daoud, Saed, Simeone, Osvaldo, Rajendran, Bipin, Chatzinotas, Symeon
The latest satellite communication (SatCom) missions are characterized by a fully reconfigurable on-board software-defined payload, capable of adapting radio resources to the temporal and spatial variations of the system traffic. As pure optimization-based solutions have shown to be computationally tedious and to lack flexibility, machine learning (ML)-based methods have emerged as promising alternatives. We investigate the application of energy-efficient brain-inspired ML models for on-board radio resource management. Apart from software simulation, we report extensive experimental results leveraging the recently released Intel Loihi 2 chip. To benchmark the performance of the proposed model, we implement conventional convolutional neural networks (CNN) on a Xilinx Versal VCK5000, and provide a detailed comparison of accuracy, precision, recall, and energy efficiency for different traffic demands. Most notably, for relevant workloads, spiking neural networks (SNNs) implemented on Loihi 2 yield higher accuracy, while reducing power consumption by more than 100$\times$ as compared to the CNN-based reference platform. Our findings point to the significant potential of neuromorphic computing and SNNs in supporting on-board SatCom operations, paving the way for enhanced efficiency and sustainability in future SatCom systems.
A Convolutional Spiking Network for Gesture Recognition in Brain-Computer Interfaces
Ai, Yiming, Rajendran, Bipin
Brain-computer interfaces are being explored for a wide variety of therapeutic applications. Typically, this involves measuring and analyzing continuous-time electrical brain activity via techniques such as electrocorticogram (ECoG) or electroencephalography (EEG) to drive external devices. However, due to the inherent noise and variability in the measurements, the analysis of these signals is challenging and requires offline processing with significant computational resources. In this paper, we propose a simple yet efficient machine learning-based approach for the exemplary problem of hand gesture classification based on brain signals. We use a hybrid machine learning approach that uses a convolutional spiking neural network employing a bio-inspired event-driven synaptic plasticity rule for unsupervised feature learning of the measured analog signals encoded in the spike domain. We demonstrate that this approach generalizes to different subjects with both EEG and ECoG data and achieves superior accuracy in the range of 92.74-97.07% in identifying different hand gesture classes and motor imagery tasks.
Bayesian Inference on Binary Spiking Networks Leveraging Nanoscale Device Stochasticity
Katti, Prabodh, Skatchkovsky, Nicolas, Simeone, Osvaldo, Rajendran, Bipin, Al-Hashimi, Bashir M.
Abstract--Bayesian Neural Networks (BNNs) can overcome the problem of overconfidence that plagues traditional frequentist deep neural networks, and are hence considered to be a key enabler for reliable AI systems. In this paper, we introduce a novel Phase Change Memory (PCM)-based hardware implementation for BNNs with binary synapses. The proposed architecture consists of separate weight and noise planes, in which PCM cells are configured (b) Proposed hardware architecture consisting of a N M crossbar and operated to represent the nominal values of weights and of differential PCM (DPCM) cells, referred to as the weight to generate the required noise for sampling, respectively. We choose L < M and reuse hardware accuracy and expected calibration error matching that the conductance values from the L rows in the noise plane stored of an 8-bit fixed-point (FxP8) implementation, with projected in a register through stochastic arbitration (SA), in order to reduce savings of over 9 in terms of core area transistor count. Non-volatile memory (NVM) devices such as Resistive RAM (RRAM), Phase Change Memory (PCM) and Spin-Modern neural networks tend to produce overconfident decisions, Transfer Torque RAM (STTRAM) are being explored for misrepresenting the inherent epistemic uncertainty that the implementation of in-memory computing (IMC) architectures arises from access to limited data [1].
Hybrid In-memory Computing Architecture for the Training of Deep Neural Networks
Joshi, Vinay, He, Wangxin, Seo, Jae-sun, Rajendran, Bipin
The cost involved in training deep neural networks (DNNs) on von-Neumann architectures has motivated the development of novel solutions for efficient DNN training accelerators. We propose a hybrid in-memory computing (HIC) architecture for the training of DNNs on hardware accelerators that results in memory-efficient inference and outperforms baseline software accuracy in benchmark tasks. We introduce a weight representation technique that exploits both binary and multi-level phase-change memory (PCM) devices, and this leads to a memory-efficient inference accelerator. Unlike previous in-memory computing-based implementations, we use a low precision weight update accumulator that results in more memory savings. We trained the ResNet-32 network to classify CIFAR-10 images using HIC. For a comparable model size, HIC-based training outperforms baseline network, trained in floating-point 32-bit (FP32) precision, by leveraging appropriate network width multiplier. Furthermore, we observe that HIC-based training results in about 50% less inference model size to achieve baseline comparable accuracy. We also show that the temporal drift in PCM devices has a negligible effect on post-training inference accuracy for extended periods (year). Finally, our simulations indicate HIC-based training naturally ensures that the number of write-erase cycles seen by the devices is a small fraction of the endurance limit of PCM, demonstrating the feasibility of this architecture for achieving hardware platforms that can learn in the field.