reram
HePGA: A Heterogeneous Processing-in-Memory based GNN Training Accelerator
Ogbogu, Chukwufumnanya, Narang, Gaurav, Joardar, Biresh Kumar, Doppa, Janardhan Rao, Chakrabarty, Krishnendu, Pande, Partha Pratim
Processing-In-Memory (PIM) architectures offer a promising approach to accelerate Graph Neural Network (GNN) training and inference. However, various PIM devices such as ReRAM, FeFET, PCM, MRAM, and SRAM exist, with each device offering unique trade-offs in terms of power, latency, area, and non-idealities. A heterogeneous manycore architecture enabled by 3D integration can combine multiple PIM devices on a single platform, to enable energy-efficient and high-performance GNN training. In this work, we propose a 3D heterogeneous PIM-based accelerator for GNN training referred to as HePGA. We leverage the unique characteristics of GNN layers and associated computing kernels to optimize their mapping on to different PIM devices as well as planar tiers. Our experimental analysis shows that HePGA outperforms existing PIM-based architectures by up to 3.8x and 6.8x in energy-efficiency (TOPS/W) and compute efficiency (TOPS/mm2) respectively, without sacrificing the GNN prediction accuracy. Finally, we demonstrate the applicability of HePGA to accelerate inferencing of emerging transformer models.
- Asia > Middle East > Oman > Al Wusta Governorate > Haima (0.04)
- North America > United States > Washington > Whitman County > Pullman (0.04)
- North America > United States > Virginia (0.04)
- (5 more...)
Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures
Dhingra, Pratyush, Doppa, Janardhan Rao, Pande, Partha Pratim
Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision. However, the compute and memory requirements introduced by transformer models make them challenging to adopt for edge applications. Furthermore, fine-tuning pre-trained transformers (e.g., foundation models) is a common task to enhance the model's predictive performance on specific tasks/applications. Existing transformer accelerators are oblivious to complexities introduced by fine-tuning. In this paper, we propose the design of a three-dimensional (3D) heterogeneous architecture referred to as Atleus that incorporates heterogeneous computing resources specifically optimized to accelerate transformer models for the dual purposes of fine-tuning and inference. Specifically, Atleus utilizes non-volatile memory and systolic array for accelerating transformer computational kernels using an integrated 3D platform. Moreover, we design a suitable NoC to achieve high performance and energy efficiency. Finally, Atleus adopts an effective quantization scheme to support model compression. Experimental results demonstrate that Atleus outperforms existing state-of-the-art by up to 56x and 64.5x in terms of performance and energy efficiency respectively
- Asia > Middle East > Oman > Al Wusta Governorate > Haima (0.06)
- North America > United States > Washington > Whitman County > Pullman (0.04)
- North America > United States > Oregon > Benton County > Corvallis (0.04)
- (4 more...)
Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends
Victor, Jeffry, Wang, Chunguang, Gupta, Sumeet K.
Crossbar memory arrays have been touted as the workhorse of in-memory computing (IMC)-based acceleration of Deep Neural Networks (DNNs), but the associated hardware non-idealities limit their efficacy. To address this, cross-layer design solutions that reduce the impact of hardware non-idealities on DNN accuracy are needed. In Part 1 of this paper, we established the co-optimization strategies for various memory technologies and their crossbar arrays, and conducted a comparative technology evaluation in the context of IMC robustness. In this part, we analyze various design knobs such as array size and bit-slice (number of bits per device) and their impact on the performance of 8T SRAM, ferroelectric transistor (FeFET), Resistive RAM (ReRAM) and spin-orbit-torque magnetic RAM (SOT-MRAM) in the context of inference accuracy at 7nm technology node. Further, we study the effect of circuit design solutions such as Partial Wordline Activation (PWA) and custom ADC reference levels that reduce the hardware non-idealities and comparatively analyze the response of each technology to such accuracy enhancing techniques. Our results on ResNet-20 (with CIFAR-10) show that PWA increases accuracy by up to 32.56% while custom ADC reference levels yield up to 31.62% accuracy enhancement. We observe that compared to the other technologies, FeFET, by virtue of its small layout height and high distinguishability of its memory states, is best suited for large arrays. For higher bit-slices and a more complex dataset (ResNet-50 with Cifar-100) we found that ReRAM matches the performance of FeFET.
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Stuck-at Faults in ReRAM Neuromorphic Circuit Array and their Correction through Machine Learning
In this paper, we study the inference accuracy of the Resistive Random Access Memory (ReRAM) neuromorphic circuit due to stuck-at faults (stuck-on, stuck-off, and stuck at a certain resistive value). A simulation framework using Python is used to perform supervised machine learning (neural network with 3 hidden layers, 1 input layer, and 1 output layer) of handwritten digits and construct a corresponding fully analog neuromorphic circuit (4 synaptic arrays) simulated by Spectre. A generic 45nm Process Development Kit (PDK) was used. We study the difference in the inference accuracy degradation due to stuck-on and stuck-off defects. Various defect patterns are studied including circular, ring, row, column, and circular-complement defects. It is found that stuck-on and stuck-off defects have a similar effect on inference accuracy. However, it is also found that if there is a spatial defect variation across the columns, the inference accuracy may be degraded significantly. We also propose a machine learning (ML) strategy to recover the inference accuracy degradation due to stuck-at faults. The inference accuracy is improved from 48% to 85% in a defective neuromorphic circuit.
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > District of Columbia > Washington (0.04)
- North America > United States > California > Orange County > Irvine (0.04)
- (2 more...)
Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM
Li, Bingbing, Yuan, Geng, Wang, Zigeng, Huang, Shaoyi, Peng, Hongwu, Behnam, Payman, Wen, Wujie, Liu, Hang, Ding, Caiwen
Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication. However, hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference. While additional crossbars can be used to address these failures, they come with storage overhead and are not efficient in terms of space, energy, and cost. In this paper, we propose a fault protection mechanism that incurs zero space cost. Our approach includes: 1) differentiable structure pruning of rows and columns to reduce model redundancy, 2) weight duplication and voting for robust output, and 3) embedding duplicated most significant bits (MSBs) into the model weight. We evaluate our method on nine tasks of the GLUE benchmark with the BERT model, and experimental results prove its effectiveness.
- North America > United States > Connecticut (0.05)
- North America > United States > North Carolina (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- (2 more...)
TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations
Wang, Dengfeng, Xu, Liukai, Liu, Songyuan, Li, zhi, Chen, Yiming, He, Weifeng, Li, Xueqing, Su, Yanan
Accommodating all the weights on-chip for large-scale NNs remains a great challenge for SRAM based computing-in-memory (SRAM-CIM) with limited on-chip capacity. Previous non-volatile SRAM-CIM (nvSRAM-CIM) addresses this issue by integrating high-density single-level ReRAMs on the top of high-efficiency SRAM-CIM for weight storage to eliminate the off-chip memory access. However, previous SL-nvSRAM-CIM suffers from poor scalability for an increased number of SL-ReRAMs and limited computing efficiency. To overcome these challenges, this work proposes an ultra-high-density three-level ReRAMs-assisted computing-in-nonvolatile-SRAM (TL-nvSRAM-CIM) scheme for large NN models. The clustered n-selector-n-ReRAM (cluster-nSnRs) is employed for reliable weight-restore with eliminated DC power. Furthermore, a ternary SRAM-CIM mechanism with differential computing scheme is proposed for energy-efficient ternary MAC operations while preserving high NN accuracy. The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-of-art works. Moreover, TL-nvSRAM-CIM shows up to 2.9x and 1.9x enhanced energy-efficiency, respectively, compared to the baseline designs of SRAM-CIM and ReRAM-CIM, respectively.
Weebit Nano tapes-out first 22nm demo chip
HOD HASHARON, Israel – Jan. 3, 2023 – Weebit Nano Limited (ASX:WBT), a leading developer of next-generation memory technologies for the global semiconductor industry, has taped-out (released to manufacturing) demonstration chips integrating its embedded Resistive Random-Access Memory (ReRAM or RRAM) module in an advanced 22nm FD-SOI (fully depleted silicon on insulator) process technology. This is the first tape-out of Weebit ReRAM in 22nm, one of the industry's most common process nodes, and a geometry where embedded flash is not viable. Weebit worked with its development partners CEA-Leti and CEA-List to successfully scale its ReRAM technology down to 22nm. The teams designed a full IP memory module that integrates a multi-megabit ReRAM block targeting the 22nm FD-SOI process which is designed to deliver outstanding performance for connected and ultra-low power applications such as IoT and edge AI. As embedded flash is unable to scale below 28nm, new non-volatile memory (NVM) technology is needed for smaller process geometries.
- Asia > Middle East > Israel (0.26)
- North America > United States (0.06)
- Information Technology > Artificial Intelligence (0.86)
- Information Technology > Hardware > Memory (0.73)
Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation
Yazdanbakhsh, Amir, Moradifirouzabadi, Ashkan, Li, Zheng, Kang, Mingu
As its core computation, a self-attention mechanism gauges pairwise correlations across the entire input sequence. Despite favorable performance, calculating pairwise correlations is prohibitively costly. While recent work has shown the benefits of runtime pruning of elements with low attention scores, the quadratic complexity of self-attention mechanisms and their on-chip memory capacity demands are overlooked. This work addresses these constraints by architecting an accelerator, called SPRINT, which leverages the inherent parallelism of ReRAM crossbar arrays to compute attention scores in an approximate manner. Our design prunes the low attention scores using a lightweight analog thresholding circuitry within ReRAM, enabling SPRINT to fetch only a small subset of relevant data to on-chip memory. To mitigate potential negative repercussions for model accuracy, SPRINT re-computes the attention scores for the few fetched data in digital. The combined in-memory pruning and on-chip recompute of the relevant attention scores enables SPRINT to transform quadratic complexity to a merely linear one. In addition, we identify and leverage a dynamic spatial locality between the adjacent attention operations even after pruning, which eliminates costly yet redundant data fetches. We evaluate our proposed technique on a wide range of state-of-the-art transformer models. On average, SPRINT yields 7.5x speedup and 19.6x energy reduction when total 16KB on-chip memory is used, while virtually on par with iso-accuracy of the baseline models (on average 0.36% degradation).
- North America > United States > California > San Diego County > San Diego (0.04)
- North America > Canada > Ontario > Toronto (0.04)
A High Throughput Generative Vector Autoregression Model for Stochastic Synapses
Hennen, T., Elias, A., Nodin, J. F., Molas, G., Waser, R., Wouters, D. J., Bedau, D.
Recent trends in computing hardware have placed increasing emphasis on neuromorphic architectures implementing machine learning (ML) algorithms directly in hardware. Such bio-inspired approaches, through in-memory computation and massive parallelism, excel in new classes of computational problems and offer promising advantages with respect to power consumption error resiliency. While CMOS-based neuromorphic computing (NC) implementations have made substantial progress recently, new materials and physical mechanisms may ultimately provide better opportunities for energy efficiency and scaling [1, 2, 3]. A specific functionality required in NC applications is the ability to mimic synaptic connections and plasticity by allowing the storage of large numbers of interconnected and continuously adaptable resistance values. Several candidate memory technologies such as MRAM, ReRAM, PCM, CeRAM, are emerging to cover this behavior using different physical mechanisms [4, 5, 6, 7]. Among these, ReRAM is attractive for its simplicity of materials and device structure, providing the necessary CMOS compatibility and scalability [8]. ReRAM is essentially a two terminal nanoscale electrochemical cell, whose variable resistance state is based on manipulation of the point defect configuration in the oxide material (depicted in Figure 1).
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > District of Columbia > Washington (0.04)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- (10 more...)
EETimes - ReRAM Research Improves Independent AI Learning
Recent research using Weebit Nano's silicon oxide (SiOx) ReRAM technology outlines a brain-inspired artificial intelligence (AI) system which can perform unsupervised learning tasks with high accuracy results. The work was done by researchers at Polimi University and presented in a recent joint paper with the company that details a novel AI self-learning demonstration based on Weebit's SiOx ReRAM. The memory technology is considered a prime candidate to succeed NAND flash memory because of its potential to be 1,000 times faster while using 1,000 times less energy than NAND, while at the same time lasting 100 times longer. Weebit's SiOx ReRAM is also appealing because it can leverage existing manufacturing processes. ReRAM has also been eyed for AI applications by several research organizations.