AITopics | reram

Collaborating Authors

reram

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

HePGA: A Heterogeneous Processing-in-Memory based GNN Training Accelerator

Ogbogu, Chukwufumnanya, Narang, Gaurav, Joardar, Biresh Kumar, Doppa, Janardhan Rao, Chakrabarty, Krishnendu, Pande, Partha Pratim

arXiv.org Artificial IntelligenceAug-25-2025

Processing-In-Memory (PIM) architectures offer a promising approach to accelerate Graph Neural Network (GNN) training and inference. However, various PIM devices such as ReRAM, FeFET, PCM, MRAM, and SRAM exist, with each device offering unique trade-offs in terms of power, latency, area, and non-idealities. A heterogeneous manycore architecture enabled by 3D integration can combine multiple PIM devices on a single platform, to enable energy-efficient and high-performance GNN training. In this work, we propose a 3D heterogeneous PIM-based accelerator for GNN training referred to as HePGA. We leverage the unique characteristics of GNN layers and associated computing kernels to optimize their mapping on to different PIM devices as well as planar tiers. Our experimental analysis shows that HePGA outperforms existing PIM-based architectures by up to 3.8x and 6.8x in energy-efficiency (TOPS/W) and compute efficiency (TOPS/mm2) respectively, without sacrificing the GNN prediction accuracy. Finally, we demonstrate the applicability of HePGA to accelerate inferencing of emerging transformer models.

artificial intelligence, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2508.16011

Country:

Asia > Middle East > Oman > Al Wusta Governorate > Haima (0.04)
North America > United States > Washington > Whitman County > Pullman (0.04)
North America > United States > Virginia (0.04)
(5 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Hardware (0.93)
Information Technology > Artificial Intelligence > Natural Language (0.93)

Add feedback

Atleus: Accelerating Transformers on the Edge Enabled by 3D Heterogeneous Manycore Architectures

Dhingra, Pratyush, Doppa, Janardhan Rao, Pande, Partha Pratim

arXiv.org Artificial IntelligenceJan-16-2025

Transformer architectures have become the standard neural network model for various machine learning applications including natural language processing and computer vision. However, the compute and memory requirements introduced by transformer models make them challenging to adopt for edge applications. Furthermore, fine-tuning pre-trained transformers (e.g., foundation models) is a common task to enhance the model's predictive performance on specific tasks/applications. Existing transformer accelerators are oblivious to complexities introduced by fine-tuning. In this paper, we propose the design of a three-dimensional (3D) heterogeneous architecture referred to as Atleus that incorporates heterogeneous computing resources specifically optimized to accelerate transformer models for the dual purposes of fine-tuning and inference. Specifically, Atleus utilizes non-volatile memory and systolic array for accelerating transformer computational kernels using an integrated 3D platform. Moreover, we design a suitable NoC to achieve high performance and energy efficiency. Finally, Atleus adopts an effective quantization scheme to support model compression. Experimental results demonstrate that Atleus outperforms existing state-of-the-art by up to 56x and 64.5x in terms of performance and energy efficiency respectively

architecture, atleus, computation, (15 more...)

arXiv.org Artificial Intelligence

2501.09588

Country:

Asia > Middle East > Oman > Al Wusta Governorate > Haima (0.06)
North America > United States > Washington > Whitman County > Pullman (0.04)
North America > United States > Oregon > Benton County > Corvallis (0.04)
(4 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Comparative Evaluation of Memory Technologies for Synaptic Crossbar Arrays- Part 2: Design Knobs and DNN Accuracy Trends

Victor, Jeffry, Wang, Chunguang, Gupta, Sumeet K.

arXiv.org Artificial IntelligenceAug-11-2024

Crossbar memory arrays have been touted as the workhorse of in-memory computing (IMC)-based acceleration of Deep Neural Networks (DNNs), but the associated hardware non-idealities limit their efficacy. To address this, cross-layer design solutions that reduce the impact of hardware non-idealities on DNN accuracy are needed. In Part 1 of this paper, we established the co-optimization strategies for various memory technologies and their crossbar arrays, and conducted a comparative technology evaluation in the context of IMC robustness. In this part, we analyze various design knobs such as array size and bit-slice (number of bits per device) and their impact on the performance of 8T SRAM, ferroelectric transistor (FeFET), Resistive RAM (ReRAM) and spin-orbit-torque magnetic RAM (SOT-MRAM) in the context of inference accuracy at 7nm technology node. Further, we study the effect of circuit design solutions such as Partial Wordline Activation (PWA) and custom ADC reference levels that reduce the hardware non-idealities and comparatively analyze the response of each technology to such accuracy enhancing techniques. Our results on ResNet-20 (with CIFAR-10) show that PWA increases accuracy by up to 32.56% while custom ADC reference levels yield up to 31.62% accuracy enhancement. We observe that compared to the other technologies, FeFET, by virtue of its small layout height and high distinguishability of its memory states, is best suited for large arrays. For higher bit-slices and a more complex dataset (ResNet-50 with Cifar-100) we found that ReRAM matches the performance of FeFET.

accuracy, inference accuracy, reference level, (17 more...)

arXiv.org Artificial Intelligence

2408.05857

Country:

North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Stuck-at Faults in ReRAM Neuromorphic Circuit Array and their Correction through Machine Learning

Sawal, Vedant, Wong, Hiu Yung

arXiv.org Artificial IntelligenceFeb-15-2024

In this paper, we study the inference accuracy of the Resistive Random Access Memory (ReRAM) neuromorphic circuit due to stuck-at faults (stuck-on, stuck-off, and stuck at a certain resistive value). A simulation framework using Python is used to perform supervised machine learning (neural network with 3 hidden layers, 1 input layer, and 1 output layer) of handwritten digits and construct a corresponding fully analog neuromorphic circuit (4 synaptic arrays) simulated by Spectre. A generic 45nm Process Development Kit (PDK) was used. We study the difference in the inference accuracy degradation due to stuck-on and stuck-off defects. Various defect patterns are studied including circular, ring, row, column, and circular-complement defects. It is found that stuck-on and stuck-off defects have a similar effect on inference accuracy. However, it is also found that if there is a spatial defect variation across the columns, the inference accuracy may be degraded significantly. We also propose a machine learning (ML) strategy to recover the inference accuracy degradation due to stuck-at faults. The inference accuracy is improved from 48% to 85% in a defective neuromorphic circuit.

artificial intelligence, inference accuracy, machine learning, (15 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/LAEDC61552.2024.10555838

2402.10981

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > District of Columbia > Washington (0.04)
North America > United States > California > Orange County > Irvine (0.04)
(2 more...)

Genre: Research Report (0.65)

Industry: Semiconductors & Electronics (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Zero-Space Cost Fault Tolerance for Transformer-based Language Models on ReRAM

Li, Bingbing, Yuan, Geng, Wang, Zigeng, Huang, Shaoyi, Peng, Hongwu, Behnam, Payman, Wen, Wujie, Liu, Hang, Ding, Caiwen

arXiv.org Artificial IntelligenceJan-21-2024

Resistive Random Access Memory (ReRAM) has emerged as a promising platform for deep neural networks (DNNs) due to its support for parallel in-situ matrix-vector multiplication. However, hardware failures, such as stuck-at-fault defects, can result in significant prediction errors during model inference. While additional crossbars can be used to address these failures, they come with storage overhead and are not efficient in terms of space, energy, and cost. In this paper, we propose a fault protection mechanism that incurs zero space cost. Our approach includes: 1) differentiable structure pruning of rows and columns to reduce model redundancy, 2) weight duplication and voting for robust output, and 3) embedding duplicated most significant bits (MSBs) into the model weight. We evaluate our method on nine tasks of the GLUE benchmark with the BERT model, and experimental results prove its effectiveness.

matrix, msb, pruning, (12 more...)

arXiv.org Artificial Intelligence

2401.11664

Country:

North America > United States > Connecticut (0.05)
North America > United States > North Carolina (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

TL-nvSRAM-CIM: Ultra-High-Density Three-Level ReRAM-Assisted Computing-in-nvSRAM with DC-Power Free Restore and Ternary MAC Operations

Wang, Dengfeng, Xu, Liukai, Liu, Songyuan, Li, zhi, Chen, Yiming, He, Weifeng, Li, Xueqing, Su, Yanan

arXiv.org Artificial IntelligenceJul-5-2023

Accommodating all the weights on-chip for large-scale NNs remains a great challenge for SRAM based computing-in-memory (SRAM-CIM) with limited on-chip capacity. Previous non-volatile SRAM-CIM (nvSRAM-CIM) addresses this issue by integrating high-density single-level ReRAMs on the top of high-efficiency SRAM-CIM for weight storage to eliminate the off-chip memory access. However, previous SL-nvSRAM-CIM suffers from poor scalability for an increased number of SL-ReRAMs and limited computing efficiency. To overcome these challenges, this work proposes an ultra-high-density three-level ReRAMs-assisted computing-in-nonvolatile-SRAM (TL-nvSRAM-CIM) scheme for large NN models. The clustered n-selector-n-ReRAM (cluster-nSnRs) is employed for reliable weight-restore with eliminated DC power. Furthermore, a ternary SRAM-CIM mechanism with differential computing scheme is proposed for energy-efficient ternary MAC operations while preserving high NN accuracy. The proposed TL-nvSRAM-CIM achieves 7.8x higher storage density, compared with the state-of-art works. Moreover, TL-nvSRAM-CIM shows up to 2.9x and 1.9x enhanced energy-efficiency, respectively, compared to the baseline designs of SRAM-CIM and ReRAM-CIM, respectively.

artificial intelligence, machine learning, tl-nvsram-cim, (16 more...)

arXiv.org Artificial Intelligence

2307.02717

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > Mexico > Gulf of Mexico (0.04)
Europe (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)

Add feedback

Weebit Nano tapes-out first 22nm demo chip

#artificialintelligenceJan-3-2023, 08:04:10 GMT

HOD HASHARON, Israel – Jan. 3, 2023 – Weebit Nano Limited (ASX:WBT), a leading developer of next-generation memory technologies for the global semiconductor industry, has taped-out (released to manufacturing) demonstration chips integrating its embedded Resistive Random-Access Memory (ReRAM or RRAM) module in an advanced 22nm FD-SOI (fully depleted silicon on insulator) process technology. This is the first tape-out of Weebit ReRAM in 22nm, one of the industry's most common process nodes, and a geometry where embedded flash is not viable. Weebit worked with its development partners CEA-Leti and CEA-List to successfully scale its ReRAM technology down to 22nm. The teams designed a full IP memory module that integrates a multi-megabit ReRAM block targeting the 22nm FD-SOI process which is designed to deliver outstanding performance for connected and ultra-low power applications such as IoT and edge AI. As embedded flash is unable to scale below 28nm, new non-volatile memory (NVM) technology is needed for smaller process geometries.

demo chip, reram, weebit nano, (13 more...)

#artificialintelligence

Country:

Asia > Middle East > Israel (0.26)
North America > United States (0.06)

Genre: Press Release (0.51)

Industry:

Semiconductors & Electronics (1.00)
Law > Intellectual Property & Technology Law (0.35)

Technology:

Information Technology > Artificial Intelligence (0.86)
Information Technology > Hardware > Memory (0.73)

Add feedback

Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation

Yazdanbakhsh, Amir, Moradifirouzabadi, Ashkan, Li, Zheng, Kang, Mingu

arXiv.org Artificial IntelligenceSep-1-2022

As its core computation, a self-attention mechanism gauges pairwise correlations across the entire input sequence. Despite favorable performance, calculating pairwise correlations is prohibitively costly. While recent work has shown the benefits of runtime pruning of elements with low attention scores, the quadratic complexity of self-attention mechanisms and their on-chip memory capacity demands are overlooked. This work addresses these constraints by architecting an accelerator, called SPRINT, which leverages the inherent parallelism of ReRAM crossbar arrays to compute attention scores in an approximate manner. Our design prunes the low attention scores using a lightweight analog thresholding circuitry within ReRAM, enabling SPRINT to fetch only a small subset of relevant data to on-chip memory. To mitigate potential negative repercussions for model accuracy, SPRINT re-computes the attention scores for the few fetched data in digital. The combined in-memory pruning and on-chip recompute of the relevant attention scores enables SPRINT to transform quadratic complexity to a merely linear one. In addition, we identify and leverage a dynamic spatial locality between the adjacent attention operations even after pruning, which eliminates costly yet redundant data fetches. We evaluate our proposed technique on a wide range of state-of-the-art transformer models. On average, SPRINT yields 7.5x speedup and 19.6x energy reduction when total 16KB on-chip memory is used, while virtually on par with iso-accuracy of the baseline models (on average 0.36% degradation).

computation, neural network, vector, (14 more...)

arXiv.org Artificial Intelligence

2209.00606

Country:

North America > United States > California > San Diego County > San Diego (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Communications (0.93)
(3 more...)

Add feedback

A High Throughput Generative Vector Autoregression Model for Stochastic Synapses

Hennen, T., Elias, A., Nodin, J. F., Molas, G., Waser, R., Wouters, D. J., Bedau, D.

arXiv.org Machine LearningMay-10-2022

Recent trends in computing hardware have placed increasing emphasis on neuromorphic architectures implementing machine learning (ML) algorithms directly in hardware. Such bio-inspired approaches, through in-memory computation and massive parallelism, excel in new classes of computational problems and offer promising advantages with respect to power consumption error resiliency. While CMOS-based neuromorphic computing (NC) implementations have made substantial progress recently, new materials and physical mechanisms may ultimately provide better opportunities for energy efficiency and scaling [1, 2, 3]. A specific functionality required in NC applications is the ability to mimic synaptic connections and plasticity by allowing the storage of large numbers of interconnected and continuously adaptable resistance values. Several candidate memory technologies such as MRAM, ReRAM, PCM, CeRAM, are emerging to cover this behavior using different physical mechanisms [4, 5, 6, 7]. Among these, ReRAM is attractive for its simplicity of materials and device structure, providing the necessary CMOS compatibility and scalability [8]. ReRAM is essentially a two terminal nanoscale electrochemical cell, whose variable resistance state is based on manipulation of the point defect configuration in the oxide material (depicted in Figure 1).

artificial intelligence, machine learning, voltage, (18 more...)

arXiv.org Machine Learning

2205.05053

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > District of Columbia > Washington (0.04)
South America > Uruguay > Maldonado > Maldonado (0.04)
(10 more...)

Genre: Research Report (0.64)

Industry: Semiconductors & Electronics (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

EETimes - ReRAM Research Improves Independent AI Learning

#artificialintelligenceSep-1-2020, 03:00:40 GMT

Recent research using Weebit Nano's silicon oxide (SiOx) ReRAM technology outlines a brain-inspired artificial intelligence (AI) system which can perform unsupervised learning tasks with high accuracy results. The work was done by researchers at Polimi University and presented in a recent joint paper with the company that details a novel AI self-learning demonstration based on Weebit's SiOx ReRAM. The memory technology is considered a prime candidate to succeed NAND flash memory because of its potential to be 1,000 times faster while using 1,000 times less energy than NAND, while at the same time lasting 100 times longer. Weebit's SiOx ReRAM is also appealing because it can leverage existing manufacturing processes. ReRAM has also been eyed for AI applications by several research organizations.

artificial intelligence, machine learning, reram, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback