stt-mram
Evaluation of STT-MRAM as a Scratchpad for Training in ML Accelerators
Roy, Sourjya, Wang, Cheng, Raghunathan, Anand
Progress in artificial intelligence and machine learning over the past decade has been driven by the ability to train larger deep neural networks (DNNs), leading to a compute demand that far exceeds the growth in hardware performance afforded by Moore's law. Training DNNs is an extremely memory-intensive process, requiring not just the model weights but also activations and gradients for an entire minibatch to be stored. The need to provide high-density and low-leakage on-chip memory motivates the exploration of emerging non-volatile memory for training accelerators. Spin-Transfer-Torque MRAM (STT-MRAM) offers several desirable properties for training accelerators, including 3-4x higher density than SRAM, significantly reduced leakage power, high endurance and reasonable access time. On the one hand, MRAM write operations require high write energy and latency due to the need to ensure reliable switching. In this study, we perform a comprehensive device-to-system evaluation and co-optimization of STT-MRAM for efficient ML training accelerator design. We devised a cross-layer simulation framework to evaluate the effectiveness of STT-MRAM as a scratchpad replacing SRAM in a systolic-array-based DNN accelerator. To address the inefficiency of writes in STT-MRAM, we propose to reduce write voltage and duration. To evaluate the ensuing accuracy-efficiency trade-off, we conduct a thorough analysis of the error tolerance of input activations, weights, and errors during the training. We propose heterogeneous memory configurations that enable training convergence with good accuracy. We show that MRAM provide up to 15-22x improvement in system level energy across a suite of DNN benchmarks under iso-capacity and iso-area scenarios. Further optimizing STT-MRAM write operations can provide over 2x improvement in write energy for minimal degradation in application-level training accuracy.
STT MRAM for Artificial Intelligence Applications
High Performance, Nonvolatile, Unlimited Endurance… Memory Element: MTJ (Magnetic Tunnel Junction) Information stored by magnetic polarization (nonvolatile) instead of charge MTJ bit state "1" (high resistance) and "0" (low resistance) is written by Spin Transfer Torque with a (polarized) current across MTJ Extremely Fast (as LL Cache/DRAM) Nonvolatile (Persistent) Unlimited endurance ( 1014) High Density (1T per cell) Scalable to 0x nm STT-MRAM cell: 1T MTJ 4 5. Avalanche Technology at Semicon Taiwan 2020 Stand Alone Applications STT-MRAM Broad Applications STT- MRAM Embedded Applications Unified eNVM (Flash like) eFlash, eOTP, eFuse LL Cache Memory (SRAM like) L3, eDRAM Slow SRAM (New Market Applications) (AI, IoT…) One single chip for both embedded storage and working memory nvSRAM market Memory buffers Persistent DRAM DRAM* New Market Applications* Storage Class Memory *with 3D stack MRAM High speed Unlimited endurance Low power consumption Low manufacturing cost Extended Temperature (150 oC) Y. Huai, Flash Summit 2015, Santa Clara, California, August 12, 2015.
Designing Efficient and High-performance AI Accelerators with Customized STT-MRAM
In this paper, we demonstrate the design of efficient and high-performance AI/Deep Learning accelerators with customized STT-MRAM and a reconfigurable core. Based on model-driven detailed design space exploration, we present the design methodology of an innovative scratchpad-assisted on-chip STT-MRAM based buffer system for high-performance accelerators. Using analytically derived expression of memory occupancy time of AI model weights and activation maps, the volatility of STT-MRAM is adjusted with process and temperature variation aware scaling of thermal stability factor to optimize the retention time, energy, read/write latency, and area of STT-MRAM. From the analysis of modern AI workloads and accelerator implementation in 14nm technology, we verify the efficacy of our designed AI accelerator with STT-MRAM STT-AI. Compared to an SRAM-based implementation, the STT-AI accelerator achieves 75% area and 3% power savings at iso-accuracy. Furthermore, with a relaxed bit error rate and negligible AI accuracy trade-off, the designed STT-AI Ultra accelerator achieves 75.4%, and 3.5% savings in area and power, respectively over regular SRAM-based accelerators.
New Memories Enable Neural Networks And In-Memory Computing
At the 2019 IEEE IEDM conference put on by the IEEE Electron Devices Society and the companion MRAM Global Innovation Forum put on by the IEEE Magnetics Society there were talks and workshops outlining the future of solid-state memory and storage. The 2018 IECM had the strongest focus on solid-state memory of any of these conferences I have attended. This blog will focus on non-volatile presentations from the conference. MRAM technology for embedded product applications shows promise for Internet of Things (IoT), Artificial Intelligence and many other applications. Samsung is one of the companies who have committed to bringing out embedded MRAM in their foundry business.