racetrack memory
Hardware-software co-exploration with racetrack memory based in-memory computing for CNN inference in embedded systems
Choong, Benjamin Chen Ming, Luo, Tao, Liu, Cheng, He, Bingsheng, Zhang, Wei, Zhou, Joey Tianyi
Deep neural networks generate and process large volumes of data, posing challenges for low-resource embedded systems. In-memory computing has been demonstrated as an efficient computing infrastructure and shows promise for embedded AI applications. Among newly-researched memory technologies, racetrack memory is a non-volatile technology that allows high data density fabrication, making it a good fit for in-memory computing. However, integrating in-memory arithmetic circuits with memory cells affects both the memory density and power efficiency. It remains challenging to build efficient in-memory arithmetic circuits on racetrack memory within area and energy constraints. To this end, we present an efficient in-memory convolutional neural network (CNN) accelerator optimized for use with racetrack memory. We design a series of fundamental arithmetic circuits as in-memory computing cells suited for multiply-and-accumulate operations. Moreover, we explore the design space of racetrack memory based systems and CNN model architectures, employing co-design to improve the efficiency and performance of performing CNN inference in racetrack memory while maintaining model accuracy. Our designed circuits and model-system co-optimization strategies achieve a small memory bank area with significant improvements in energy and performance for racetrack memory based embedded systems.
- Asia > Singapore (0.04)
- Europe (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- (2 more...)
- Semiconductors & Electronics (0.67)
- Information Technology (0.67)
- Energy (0.46)
Full-Stack Optimization for CAM-Only DNN Inference
de Lima, João Paulo C., Khan, Asif Ali, Carro, Luigi, Castrillon, Jeronimo
The accuracy of neural networks has greatly improved across various domains over the past years. Their ever-increasing complexity, however, leads to prohibitively high energy demands and latency in von Neumann systems. Several computing-in-memory (CIM) systems have recently been proposed to overcome this, but trade-offs involving accuracy, hardware reliability, and scalability for large models remain a challenge. Additionally, for some CIM designs, the activation movement still requires considerable time and energy. This paper explores the combination of algorithmic optimizations for ternary weight neural networks and associative processors (APs) implemented using racetrack memory (RTM). We propose a novel compilation flow to optimize convolutions on APs by reducing their arithmetic intensity. By leveraging the benefits of RTM-based APs, this approach substantially reduces data transfers within the memory while addressing accuracy, energy efficiency, and reliability concerns. Concretely, our solution improves the energy efficiency of ResNet-18 inference on ImageNet by 7.5x compared to crossbar in-memory accelerators while retaining software accuracy.
- Europe > Germany > Saxony > Dresden (0.04)
- South America > Brazil > Rio Grande do Sul > Porto Alegre (0.04)
- Europe > Germany > Saxony > Leipzig (0.04)
Sustainable AI Processing at the Edge
Ollivier, Sébastien, Li, Sheng, Tang, Yue, Chaudhuri, Chayanika, Zhou, Peipei, Tang, Xulong, Hu, Jingtong, Jones, Alex K.
Deep neural networks have become a popular algorithm for a variety of applications using mobile devices including smart phones but also recently expanding to connected and autonomous vehicles (CAVs), robotics, or even unmanned aerial vehicles (UAVs), and other smart infrastructure. Convolutional Neural Networks (CNNs) have been demonstrated to provide solutions to these problems with relatively high accuracy. While there have been many proposals to improve the performance and energy efficiency of CNN inference, these algorithms are too compute and data intensive to execute directly on mobile nodes typically operating with limited computational and energy capabilities. Thus, edge servers, now being deployed often in conjunction with advanced (e.g., 5G) wireless networks, have become a popular target to accelerate CNN inference. Moreover, due to their deployment in the field, edge servers must operate under size, weight, and power (SWaP) constraints, while serving many concurrent requests from mobile clients. Thus, to accelerate CNNs, these edge servers often use energy-efficient accelerators, reduced precision, or both to achieve fast response time while balancing requests from multiple clients and maintaining a low operational energy cost. Recently, there has been a trend to push online training to edge server nodes to avoid communicating large datasets from edge to cloud servers [1]. However, online training typically requires much higher precision and floating-point computation compared to inference. Unfortunately, the proliferation of computing, both the mobile devices, and the edge servers themselves, can come at the expense of negative environmental impacts.
- North America > United States > Texas (0.05)
- North America > United States > New York (0.05)
- North America > United States > California (0.05)
- (2 more...)
- Education > Educational Setting > Online (1.00)
- Energy > Renewable (0.94)
- Energy > Power Industry (0.93)