Goto

Collaborating Authors

 conductance state


In-memory Training on Analog Devices with Limited Conductance States via Multi-tile Residual Learning

Li, Jindan, Wu, Zhaoxian, Liu, Gaowen, Gokmen, Tayfun, Chen, Tianyi

arXiv.org Artificial Intelligence

Analog in-memory computing (AIMC) accelerators enable efficient deep neural network computation directly within memory using resistive crossbar arrays, where model parameters are represented by the conductance states of memristive devices. However, effective in-memory training typically requires at least 8-bit conductance states to match digital baselines. Realizing such fine-grained states is costly and often requires complex noise mitigation techniques that increase circuit complexity and energy consumption. In practice, many promising memristive devices such as ReRAM offer only about 4-bit resolution due to fabrication constraints, and this limited update precision substantially degrades training accuracy. To enable on-chip training with these limited-state devices, this paper proposes a \emph{residual learning} framework that sequentially learns on multiple crossbar tiles to compensate the residual errors from low-precision weight updates. Our theoretical analysis shows that the optimality gap shrinks with the number of tiles and achieves a linear convergence rate. Experiments on standard image classification benchmarks demonstrate that our method consistently outperforms state-of-the-art in-memory analog training strategies under limited-state settings, while incurring only moderate hardware overhead as confirmed by our cost analysis.


Running Conventional Automatic Speech Recognition on Memristor Hardware: A Simulated Approach

Rossenbach, Nick, Hilmes, Benedikt, Brackmann, Leon, Gunz, Moritz, Schlüter, Ralf

arXiv.org Artificial Intelligence

Memristor-based hardware offers new possibilities for energy-efficient machine learning (ML) by providing analog in-memory matrix multiplication. Current hardware prototypes cannot fit large neural networks, and related literature covers only small ML models for tasks like MNIST or single word recognition. Simulation can be used to explore how hardware properties affect larger models, but existing software assumes simplified hardware. We propose a PyTorch-based library based on "Synaptogen" to simulate neural network execution with accurately captured memristor hardware properties. For the first time, we show how an ML system with millions of parameters would behave on memristor hardware, using a Conformer trained on the speech recognition task TED-LIUMv2 as example. With adjusted quantization-aware training, we limit the relative degradation in word error rate to 25% when using a 3-bit weight precision to execute linear operations via simulated analog computation.


In-Memory Learning Automata Architecture using Y-Flash Cell

Ghazal, Omar, Lan, Tian, Ojukwu, Shalman, Krishnamurthy, Komal, Yakovlev, Alex, Shafik, Rishad

arXiv.org Artificial Intelligence

The modern implementation of machine learning architectures faces significant challenges due to frequent data transfer between memory and processing units. In-memory computing, primarily through memristor-based analog computing, offers a promising solution to overcome this von Neumann bottleneck. In this technology, data processing and storage are located inside the memory. Here, we introduce a novel approach that utilizes floating-gate Y-Flash memristive devices manufactured with a standard 180 nm CMOS process. These devices offer attractive features, including analog tunability and moderate device-to-device variation; such characteristics are essential for reliable decision-making in ML applications. This paper uses a new machine learning algorithm, the Tsetlin Machine (TM), for in-memory processing architecture. The TM's learning element, Automaton, is mapped into a single Y-Flash cell, where the Automaton's range is transferred into the Y-Flash's conductance scope. Through comprehensive simulations, the proposed hardware implementation of the learning automata, particularly for Tsetlin machines, has demonstrated enhanced scalability and on-edge learning capabilities.


Layer Ensemble Averaging for Improving Memristor-Based Artificial Neural Network Performance

Yousuf, Osama, Hoskins, Brian, Ramu, Karthick, Fream, Mitchell, Borders, William A., Madhavan, Advait, Daniels, Matthew W., Dienstfrey, Andrew, McClelland, Jabez J., Lueker-Boden, Martin, Adam, Gina C.

arXiv.org Artificial Intelligence

Artificial neural networks have advanced due to scaling dimensions, but conventional computing faces inefficiency due to the von Neumann bottleneck. This work proposes and experimentally demonstrates layer ensemble averaging - a technique to map pre-trained neural network solutions from software to defective hardware crossbars of emerging memory devices and reliably attain near-software performance on inference. The approach is investigated using a custom 20,000-device hardware prototyping platform on a continual learning problem where a network must learn new tasks without catastrophically forgetting previously learned information. Results demonstrate that by trading off the number of devices required for layer mapping, layer ensemble averaging can reliably boost defective memristive network performance up to the software baseline. For the investigated problem, the average multi-task classification accuracy improves from 61 % to 72 % (< 1 % of software baseline) using the proposed approach. Introduction The increasing demand for large-scale neural network models has prompted a focused exploration of approaches to optimize model efficiency and accelerate computations. Quantized neural networks, which employ reduced-precision representations for model parameters and activations, have emerged as a promising avenue for achieving significant computational gains without compromising performance. As the community delves into extreme quantization, another frontier in enhancing neural network efficiency unfolds through the exploration of emerging memory-based hardware accelerators. For these reasons, memristor-based neural network accelerators have the potential to transform capabilities of artificial intelligence and machine learning systems and thereby usher in a new neuromorphic era of intelligent edge computing. A comprehensive exploration of the interplay between quantized neural networks, dedicated hardware accelerators, and memristive technologies becomes imperative for advancing the capabilities of modern neural network workloads, with the overarching goal of unlocking unprecedented efficiency gains in real-world deep learning applications.


Thermal Heating in ReRAM Crossbar Arrays: Challenges and Solutions

Smagulova, Kamilya, Fouda, Mohammed E., Eltawil, Ahmed

arXiv.org Artificial Intelligence

The higher speed, scalability and parallelism offered by ReRAM crossbar arrays foster development of ReRAM-based next generation AI accelerators. At the same time, sensitivity of ReRAM to temperature variations decreases R_on/Roff ratio and negatively affects the achieved accuracy and reliability of the hardware. Various works on temperature-aware optimization and remapping in ReRAM crossbar arrays reported up to 58\% improvement in accuracy and 2.39$\times$ ReRAM lifetime enhancement. This paper classifies the challenges caused by thermal heat, starting from constraints in ReRAM cells' dimensions and characteristics to their placement in the architecture. In addition, it reviews available solutions designed to mitigate the impact of these challenges, including emerging temperature-resilient DNN training methods. Our work also provides a summary of the techniques and their advantages and limitations.


Neuromorphic computing with multi-memristive synapses

#artificialintelligence

The human brain with less than 20 W of power consumption offers a processing capability that exceeds the petaflops mark, and thus outperforms state-of-the-art supercomputers by several orders of magnitude in terms of energy efficiency and volume. Building ultra-low-power cognitive computing systems inspired by the operating principles of the brain is a promising avenue towards achieving such efficiency. Recently, deep learning has revolutionized the field of machine learning by providing human-like performance in areas, such as computer vision, speech recognition, and complex strategic games1. However, current hardware implementations of deep neural networks are still far from competing with biological neural systems in terms of real-time information-processing capabilities with comparable energy consumption. One of the reasons for this inefficiency is that most neural networks are implemented on computing systems based on the conventional von Neumann architecture with separate memory and processing units.


Searching for the Perfect Artificial Synapse for AI

IEEE Spectrum Robotics

What's the best type of device from which to build a neural network? Of course, it should be fast, small, consume little power, have the ability to reliably store many bits-worth of information. And if it's going to be involved in learning new tricks as well as performing those tricks, it has to behave predictably during the learning process. Neural networks can be thought of as a group of cells connected to other cells. These connections--synapses in biological neurons--all have particular strengths, or weights, associated with them. Rather than use the logic and memory of ordinary CPUs to represent these, companies and academic researchers have been working on ways of representing them in arrays of different kinds of nonvolatile memories.


Novel synaptic architecture for brain inspired computing

#artificialintelligence

The brain and all its magnificent capabilities is powered by less than 20 watts. Stop to think about that for a second. As I write this blog my laptop is using about 80 watts, yet at only a fourth of the power, our brain outperforms state-of-the-art supercomputers by several orders of magnitude when it comes to energy efficiency and volume. For this reason it shouldn't be surprising that scientists around the world are seeking inspiration from the human brain as a promising avenue towards the development of next generation AI computing systems and while the IT industry has made significant progress in the past several years, particularly in using machine learning for computer vision and speech recognition, current technology is hitting a wall when it comes to deep neural networks matching the power efficiency of their biological counterpart, but this could be about to change. As reported last week in Nature Communications, my colleagues and I at IBM Research and collaborators at EPFL and the New Jersey Institute of Technology have developed and experimentally tested an artificial synapse architecture using 1 million devices--a significant step towards realizing large-scale and energy efficient neuromorphic computing technology.


Synaptic Architecture for Brain Inspired Computing: IBM Research

#artificialintelligence

Our brain and all its magnificent capabilities is powered by less than 20 watts. Stop to think about that for a second. As I write this blog my laptop is using about 80 watts, yet at only a fourth of the power, our brain outperforms state-of-the-art supercomputers by several orders of magnitude when it comes to energy efficiency and volume. For this reason it shouldn't be surprising that scientists around the world are seeking inspiration from the human brain as a promising avenue towards the development of next generation AI computing systems and while the IT industry has made significant progress in the past several years, particularly in using machine learning for computer vision and speech recognition, current technology is hitting a wall when it comes to deep neural networks matching the power efficiency of their biological counterpart, but this could be about to change. As reported last week in Nature Communications, my colleagues and I at IBM Research and collaborators at EPFL and the New Jersey Institute of Technology have developed and experimentally tested an artificial synapse architecture using 1 million devices -- a significant step towards realizing large-scale and energy efficient neuromorphic computing technology.


New Hardware for Massive Neural Networks

Coon, Darryl D., Perera, A. G. Unil

Neural Information Processing Systems

ABSTRACT Transient phenomena associated with forward biased silicon p - n - n structures at 4.2K show remarkable similarities with biological neurons. The devices play a role similar to the two-terminal switching elements in Hodgkin-Huxley equivalent circuit diagrams. The devices provide simpler and more realistic neuron emulation than transistors or op-amps. They have such low power and current requirements that they could be used in massive neural networks. Some observed properties of simple circuits containing the devices include action potentials, refractory periods, threshold behavior, excitation, inhibition, summation over synaptic inputs, synaptic weights, temporal integration, memory, network connectivity modification based on experience, pacemaker activity, firing thresholds, coupling to sensors with graded signal outputs and the dependence of firing rate on input current. Transfer functions for simple artificial neurons with spiketrain inputs and spiketrain outputs have been measured and correlated with input coupling.