mram
Emergence of Fixational and Saccadic Movements in a Multi-Level Recurrent Attention Model for Vision
Pan, Pengcheng, Shogo, Yonekura, Kuniyoshi, Yasuo
Inspired by foveal vision, hard attention models promise interpretability and parameter economy. However, existing models like the Recurrent Model of Visual Attention (RAM) and Deep Recurrent Attention Model (DRAM) failed to model the hierarchy of human vision system, that compromise on the visual exploration dynamics. As a result, they tend to produce attention that are either overly fixational or excessively saccadic, diverging from human eye movement behavior. In this paper, we propose a Multi-Level Recurrent Attention Model (MRAM), a novel hard attention framework that explicitly models the neural hierarchy of human visual processing. By decoupling the function of glimpse location generation and task execution in two recurrent layers, MRAM emergent a balanced behavior between fixation and saccadic movement. Our results show that MRAM not only achieves more human-like attention dynamics, but also consistently outperforms CNN, RAM and DRAM baselines on standard image classification benchmarks.
An Efficient Memory Module for Graph Few-Shot Class-Incremental Learning Dong Li
Incremental graph learning has gained significant attention for its ability to address the catastrophic forgetting problem in graph representation learning. However, traditional methods often rely on a large number of labels for node classification, which is impractical in real-world applications. This makes few-shot incremental learning on graphs a pressing need. Current methods typically require extensive training samples from meta-learning to build memory and perform intensive fine-tuning of GNN parameters, leading to high memory consumption and potential loss of previously learned knowledge. To tackle these challenges, we introduce Mecoin, an efficient method for building and maintaining memory.
An Efficient Memory Module for Graph Few-Shot Class-Incremental Learning
Li, Dong, Zhang, Aijia, Gao, Junqi, Qi, Biqing
Incremental graph learning has gained significant attention for its ability to address the catastrophic forgetting problem in graph representation learning. However, traditional methods often rely on a large number of labels for node classification, which is impractical in real-world applications. This makes few-shot incremental learning on graphs a pressing need. Current methods typically require extensive training samples from meta-learning to build memory and perform intensive fine-tuning of GNN parameters, leading to high memory consumption and potential loss of previously learned knowledge. To tackle these challenges, we introduce Mecoin, an efficient method for building and maintaining memory. Mecoin employs Structured Memory Units to cache prototypes of learned categories, as well as Memory Construction Modules to update these prototypes for new categories through interactions between the nodes and the cached prototypes. Additionally, we have designed a Memory Representation Adaptation Module to store probabilities associated with each class prototype, reducing the need for parameter fine-tuning and lowering the forgetting rate. When a sample matches its corresponding class prototype, the relevant probabilities are retrieved from the MRaM. Knowledge is then distilled back into the GNN through a Graph Knowledge Distillation Module, preserving the model's memory. We analyze the effectiveness of Mecoin in terms of generalization error and explore the impact of different distillation strategies on model performance through experiments and VC-dimension analysis. Compared to other related works, Mecoin shows superior performance in accuracy and forgetting rate. Our code is publicly available on the https://github.com/Arvin0313/Mecoin-GFSCIL.git .
UpDLRM: Accelerating Personalized Recommendation using Real-World PIM Architecture
Chen, Sitian, Tan, Haobin, Zhou, Amelie Chi, Li, Yusen, Balaji, Pavan
Deep Learning Recommendation Models (DLRMs) have gained popularity in recommendation systems due to their effectiveness in handling large-scale recommendation tasks. The embedding layers of DLRMs have become the performance bottleneck due to their intensive needs on memory capacity and memory bandwidth. In this paper, we propose UpDLRM, which utilizes real-world processingin-memory (PIM) hardware, UPMEM DPU, to boost the memory bandwidth and reduce recommendation latency. The parallel nature of the DPU memory can provide high aggregated bandwidth for the large number of irregular memory accesses in embedding lookups, thus offering great potential to reduce the inference latency. To fully utilize the DPU memory bandwidth, we further studied the embedding table partitioning problem to achieve good workload-balance and efficient data caching. Evaluations using real-world datasets show that, UpDLRM achieves much lower inference time for DLRM compared to both CPU-only and CPU-GPU hybrid counterparts.
Vega: A 10-Core SoC for IoT End-Nodes with DNN Acceleration and Cognitive Wake-Up From MRAM-Based State-Retentive Sleep Mode
Rossi, Davide, Conti, Francesco, Eggimann, Manuel, Di Mauro, Alfio, Tagliavini, Giuseppe, Mach, Stefan, Guermandi, Marco, Pullini, Antonio, Loi, Igor, Chen, Jie, Flamand, Eric, Benini, Luca
The Internet-of-Things requires end-nodes with ultra-low-power always-on capability for a long battery lifetime, as well as high performance, energy efficiency, and extreme flexibility to deal with complex and fast-evolving near-sensor analytics algorithms (NSAAs). We present Vega, an IoT end-node SoC capable of scaling from a 1.7 $\mathrm{\mu}$W fully retentive cognitive sleep mode up to 32.2 GOPS (@ 49.4 mW) peak performance on NSAAs, including mobile DNN inference, exploiting 1.6 MB of state-retentive SRAM, and 4 MB of non-volatile MRAM. To meet the performance and flexibility requirements of NSAAs, the SoC features 10 RISC-V cores: one core for SoC and IO management and a 9-cores cluster supporting multi-precision SIMD integer and floating-point computation. Vega achieves SoA-leading efficiency of 615 GOPS/W on 8-bit INT computation (boosted to 1.3TOPS/W for 8-bit DNN inference with hardware acceleration). On floating-point (FP) compuation, it achieves SoA-leading efficiency of 79 and 129 GFLOPS/W on 32- and 16-bit FP, respectively. Two programmable machine-learning (ML) accelerators boost energy efficiency in cognitive sleep and active states, respectively.
EETimes - Memory Technologies Confront Edge AI's Diverse Challenges
With the rise of AI at the edge comes a whole host of new requirements for memory systems. Can today's memory technologies live up to the stringent demands of this challenging new application, and what do emerging memory technologies promise for edge AI in the long-term? The first thing to realize is that there is no standard "edge AI" application; the edge in its broadest interpretation covers all AI-enabled electronic systems outside the cloud. That might include "near edge," which generally covers enterprise data centers and on-premise servers. Further out are applications like computer vision for autonomous driving.
How "green" is your Artificial Intelligence?
Artificial intelligence (AI) systems face a set of conflicting goals: being accurate (consuming large amounts of computational power and electrical power) and being accessible (being lower in cost, less computationally intensive, and less power-hungry). Unfortunately, many of today's AI implementations are environmentally unsustainable. Improvements in AI energy efficiency will be driven by several factors, including more efficient algorithms, more efficient computing architectures, and more efficient components. It's necessary to measure and track the energy consumption of AI systems to identify any improvements in energy efficiency. One example of the increasing awareness of the importance of energy consumption in AI systems is having is reflected in the fact that the ULPMark (ultra-low power) benchmark line from EEMBC is now adding ML inference and developing a new benchmark, the ULPMark-ML.
Could a strange new memory chip unlock mysteries of AI? ZDNet
Modern artificial intelligence lacks a strong theoretical basis, and so it's often a shrug of the shoulders why it works at all (or, oftentimes, doesn't entirely work). One of the deepest mysteries of deep learning is one of its most brilliant successes, what's known as stochastic gradient descent. Stochasticity, the process of randomly picking out examples of data, has yielded breakthroughs in image recognition and other deep learning tasks. And now, one computer chip company thinks they may have a kind of machine for stochasticity, a chip whose power comes from randomness. It might not lead to a theory of why machine learning works, but it might lead to knew breakthroughs in what stochasticity can achieve.
Global Big Data Conference
At the 2019 Semicon Conference Applied Materials (AMAT) had a day-long seminar focused on technology, particularly memory, for artificial intelligence (AI) applications. In addition to talks by AI experts, the company also talked about their tools for manufacturing magnetic random access memory (MRAM) as well as resistive random access memory (RRAM) and Phase Change Memory (PCM). We will talk about a workshop at Stanford in August will explore emerging memories enabling artificial intelligence, especially for embedded products, such as IoT devices. Gary Dickerson from Applied Materials gave a kick-off talk at the seminar. He talked about the growth of data and the importance of memory to support data centers as well as the edge.