Programming Languages
Temporal Complexity and Self-Organization in an Exponential Dense Associative Memory Model
Cafiso, Marco, Paradisi, Paolo
Dense Associative Memory (DAM) models generalize the classical Hopfield model by incorporating n-body or exponential interactions that greatly enhance storage capacity. While the criticality of DAM models has been largely investigated, mainly within a statistical equilibrium picture, little attention has been devoted to the temporal self-organizing behavior induced by learning. In this work, we investigate the behavior of a stochastic exponential DAM (SEDAM) model through the lens of Temporal Complexity (TC), a framework that characterizes complex systems by intermittent transition events between order and disorder and by scale-free temporal statistics. Transition events associated with birth-death of neural avalanche structures are exploited for the TC analyses and compared with analogous transition events based on coincidence structures. We systematically explore how TC indicators depend on control parameters, i.e., noise intensity and memory load. Our results reveal that the SEDAM model exhibits regimes of complex intermittency characterized by nontrivial temporal correlations and scale-free behavior, indicating the spontaneous emergence of self-organizing dynamics. These regimes emerge in small intervals of noise intensity values, which, in agreement with the extended criticality concept, never shrink to a single critical point. Further, the noise intensity range needed to reach the critical region, where self-organizing behavior emerges, slightly decreases as the memory load increases. This study highlights the relevance of TC as a complementary framework for understanding learning and information processing in artificial and biological neural systems, revealing the link between the memory load and the self-organizing capacity of the network.
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Italy > Tuscany > Pisa Province > Pisa (0.04)
- (4 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.61)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.61)
CAMformer: Associative Memory is All You Need
Molom-Ochir, Tergel, Morris, Benjamin F., Horton, Mark, Wei, Chiyue, Guo, Cong, Taylor, Brady, Liu, Peter, Wang, Shan X., Fan, Deliang, Li, Hai Helen, Chen, Yiran
Transformers face scalability challenges due to the quadratic cost of attention, which involves dense similarity computations between queries and keys. We propose CAMformer, a novel accelerator that reinterprets attention as an associative memory operation and computes attention scores using a voltage-domain Binary Attention Content Addressable Memory (BA-CAM). This enables constant-time similarity search through analog charge sharing, replacing digital arithmetic with physical similarity sensing. CAMformer integrates hierarchical two-stage top-k filtering, pipelined execution, and high-precision contextualization to achieve both algorithmic accuracy and architectural efficiency. Evaluated on BERT and Vision Transformer workloads, CAMformer achieves over 10x energy efficiency, up to 4x higher throughput, and 6-8x lower area compared to state-of-the-art accelerators--while maintaining near-lossless accuracy.
- North America > United States > Arizona (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Research Report (0.50)
- Overview (0.46)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.71)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.61)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Associative Memory via a Sparse Recovery Model
Arya Mazumdar, Ankit Singh Rawat
An associative memory is a structure learned from a dataset M of vectors (signals) in a way such that, given a noisy version of one of the vectors as input, the nearest valid vector from M (nearest neighbor) is provided as output, preferably via a fast iterative algorithm. Traditionally, binary (or q -ary) Hopfield neural networks are used to model the above structure. In this paper, for the first time, we propose a model of associative memory based on sparse recovery of signals. Our basic premise is simple. For a dataset, we learn a set of linear constraints that every vector in the dataset must satisfy. Provided these linear constraints possess some special properties, it is possible to cast the task of finding nearest neighbor as a sparse recovery problem. Assuming generic random models for the dataset, we show that it is possible to store super-polynomial or exponential number of n -length vectors in a neural network of size O ( n) . Furthermore, given a noisy version of one of the stored vectors corrupted in near-linear number of coordinates, the vector can be correctly recalled using a neurally feasible algorithm.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Minnesota (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.85)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.85)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)
- South America > Brazil (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- Europe > Austria > Vienna (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- (4 more...)
- South America > Brazil (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- Europe > Austria > Vienna (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- (4 more...)
Muon Outperforms Adam in Tail-End Associative Memory Learning
Wang, Shuche, Zhang, Fengzhuo, Li, Jiaxiang, Du, Cunxiao, Du, Chao, Pang, Tianyu, Yang, Zhuoran, Hong, Mingyi, Tan, Vincent Y. F.
The Muon optimizer is consistently faster than Adam in training Large Language Models (LLMs), yet the mechanism underlying its success remains unclear. This paper demystifies this mechanism through the lens of associative memory. By ablating the transformer components optimized by Muon, we reveal that the associative memory parameters of LLMs, namely the Value and Output (VO) attention weights and Feed-Forward Networks (FFNs), are the primary contributors to Muon's superiority. Motivated by this associative memory view, we then explain Muon's superiority on real-world corpora, which are intrinsically heavy-tailed: a few classes (tail classes) appear far less frequently than others. The superiority is explained through two key properties: (i) its update rule consistently yields a more isotropic singular spectrum than Adam; and as a result, (ii) on heavy-tailed data, it optimizes tail classes more effectively than Adam. Beyond empirical evidence, we theoretically confirm these findings by analyzing a one-layer associative memory model under class-imbalanced data. We prove that Muon consistently achieves balanced learning across classes regardless of feature embeddings, whereas Adam can induce large disparities in learning errors depending on embedding properties. In summary, our empirical observations and theoretical analyses reveal Muon's core advantage: its update rule aligns with the outer-product structure of linear associative memories, enabling more balanced and effective learning of tail classes in heavy-tailed distributions than Adam.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Florida > Miami-Dade County > Miami (0.04)
- North America > United States > New York (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Modern Methods in Associative Memory
Krotov, Dmitry, Hoover, Benjamin, Ram, Parikshit, Pham, Bao
Associative Memories like the famous Hopfield Networks are elegant models for describing fully recurrent neural networks whose fundamental job is to store and retrieve information. In the past few years they experienced a surge of interest due to novel theoretical results pertaining to their information storage capabilities, and their relationship with SOTA AI architectures, such as Transformers and Diffusion Models. These connections open up possibilities for interpreting the computation of traditional AI networks through the theoretical lens of Associative Memories. Additionally, novel Lagrangian formulations of these networks make it possible to design powerful distributed models that learn useful representations and inform the design of novel architectures. This tutorial provides an approachable introduction to Associative Memories, emphasizing the modern language and methods used in this area of research, with practical hands-on mathematical derivations and coding notebooks.
- North America > United States > New York (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Associative Memory via a Sparse Recovery Model
Arya Mazumdar, Ankit Singh Rawat
An associative memory is a structure learned from a dataset M of vectors (signals) in a way such that, given a noisy version of one of the vectors as input, the nearest valid vector from M (nearest neighbor) is provided as output, preferably via a fast iterative algorithm. Traditionally, binary (or q -ary) Hopfield neural networks are used to model the above structure. In this paper, for the first time, we propose a model of associative memory based on sparse recovery of signals. Our basic premise is simple. For a dataset, we learn a set of linear constraints that every vector in the dataset must satisfy. Provided these linear constraints possess some special properties, it is possible to cast the task of finding nearest neighbor as a sparse recovery problem. Assuming generic random models for the dataset, we show that it is possible to store super-polynomial or exponential number of n -length vectors in a neural network of size O ( n) . Furthermore, given a noisy version of one of the stored vectors corrupted in near-linear number of coordinates, the vector can be correctly recalled using a neurally feasible algorithm.
- North America > United States > Texas > Travis County > Austin (0.14)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Minnesota (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.85)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.85)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.56)
Dense associative memory on the Bures-Wasserstein space
Tankala, Chandan, Balasubramanian, Krishnakumar
Dense associative memories (DAMs) store and retrieve patterns via energy-functional fixed points, but existing models are limited to vector representations. We extend DAMs to probability distributions equipped with the 2-Wasserstein distance, focusing mainly on the Bures-Wasserstein class of Gaussian densities. Our framework defines a log-sum-exp energy over stored distributions and a retrieval dynamics aggregating optimal transport maps in a Gibbs-weighted manner. Stationary points correspond to self-consistent Wasserstein barycenters, generalizing classical DAM fixed points. We prove exponential storage capacity, provide quantitative retrieval guarantees under Wasserstein perturbations, and validate the model on synthetic and real-world distributional tasks. This work elevates associative memory from vectors to full distributions, bridging classical DAMs with modern generative modeling and enabling distributional storage and retrieval in memory-augmented learning.
- North America > United States > Oregon > Lane County > Eugene (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- Asia > Japan > Honshū > Kansai > Osaka Prefecture > Osaka (0.04)
Gated Associative Memory: A Parallel O(N) Architecture for Efficient Sequence Modeling
The Transformer architecture, underpinned by the self-attention mechanism, has become the de facto standard for sequence modeling tasks. However, its core computational primitive scales quadratically with sequence length (O(N^2)), creating a significant bottleneck for processing long contexts. In this paper, we propose the Gated Associative Memory (GAM) network, a novel, fully parallel architecture for sequence modeling that exhibits linear complexity (O(N)) with respect to sequence length. The GAM block replaces the self-attention layer with two parallel pathways: a causal convolution to efficiently capture local, position-dependent context, and a parallel associative memory retrieval mechanism to model global, content-based patterns. These pathways are dynamically fused using a gating mechanism, allowing the model to flexibly combine local and global information for each token. We implement GAM from scratch and conduct a rigorous comparative analysis against a standard Transformer model and a modern linear-time baseline (Mamba) on the WikiText-2 benchmark, as well as against the Transformer on the TinyStories dataset. Our experiments demonstrate that GAM is consistently faster, outperforming both baselines on training speed, and achieves a superior or competitive final validation perplexity across all datasets, establishing it as a promising and efficient alternative for sequence modeling.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.94)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.84)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)