Goto

Collaborating Authors

 Programming Languages


Dense Associative Memory Through the Lens of Random Features

Neural Information Processing Systems

Dense Associative Memories are high storage capacity variants of the Hopfield networks that are capable of storing a large number of memory patterns in the weights of the network of a given size. Their common formulations typically require storing each pattern in a separate set of synaptic weights, which leads to the increase of the number of synaptic weights when new patterns are introduced. In this work we propose an alternative formulation of this class of models using random features, commonly used in kernel methods. In this formulation the number of network's parameters remains fixed. At the same time, new memories can be added to the network by modifying existing weights.


Improve Language Model and Brain Alignment via Associative Memory

arXiv.org Artificial Intelligence

Associative memory engages in the integration of relevant information for comprehension in the human cognition system. In this work, we seek to improve alignment between language models and human brain while processing speech information by integrating associative memory. After verifying the alignment between language model and brain by mapping language model activations to brain activity, the original text stimuli expanded with simulated associative memory are regarded as input to computational language models. We find the alignment between language model and brain is improved in brain regions closely related to associative memory processing. We also demonstrate large language models after specific supervised fine-tuning better align with brain response, by building the \textit{Association} dataset containing 1000 samples of stories, with instructions encouraging associative memory as input and associated content as output.


Hardware-Adaptive and Superlinear-Capacity Memristor-based Associative Memory

arXiv.org Artificial Intelligence

Brain-inspired computing aims to mimic cognitive functions like associative memory, the ability to recall complete patterns from partial cues. Memristor technology offers promising hardware for such neuromorphic systems due to its potential for efficient in-memory analog computing. Hopfield Neural Networks (HNNs) are a classic model for associative memory, but implementations on conventional hardware suffer from efficiency bottlenecks, while prior memristor-based HNNs faced challenges with vulnerability to hardware defects due to offline training, limited storage capacity, and difficulty processing analog patterns. Here we introduce and experimentally demonstrate on integrated memristor hardware a new hardware-adaptive learning algorithm for associative memories that significantly improves defect tolerance and capacity, and naturally extends to scalable multilayer architectures capable of handling both binary and continuous patterns. Our approach achieves 3x effective capacity under 50% device faults compared to state-of-the-art methods. Furthermore, its extension to multilayer architectures enables superlinear capacity scaling (\(\propto N^{1.49}\ for binary patterns) and effective recalling of continuous patterns (\propto N^{1.74}\ scaling), as compared to linear capacity scaling for previous HNNs. It also provides flexibility to adjust capacity by tuning hidden neurons for the same-sized patterns. By leveraging the massive parallelism of the hardware enabled by synchronous updates, it reduces energy by 8.8x and latency by 99.7% for 64-dimensional patterns over asynchronous schemes, with greater improvements at scale. This promises the development of more reliable memristor-based associative memory systems and enables new applications research due to the significantly improved capacity, efficiency, and flexibility.


Beyond Disorder: Unveiling Cooperativeness in Multidirectional Associative Memories

arXiv.org Machine Learning

By leveraging tools from the statistical mechanics of complex systems, in these short notes we extend the architecture of a neural network for hetero-associative memory (called three-directional associative memories, TAM) to explore supervised and unsupervised learning protocols. In particular, by providing entropic-heterogeneous datasets to its various layers, we predict and quantify a new emergent phenomenon -- that we term {\em layer's cooperativeness} -- where the interplay of dataset entropies across network's layers enhances their retrieval capabilities Beyond those they would have without reciprocal influence. Naively we would expect layers trained with less informative datasets to develop smaller retrieval regions compared to those pertaining to layers that experienced more information: this does not happen and all the retrieval regions settle to the same amplitude, allowing for optimal retrieval performance globally. This cooperative dynamics marks a significant advancement in understanding emergent computational capabilities within disordered systems.


MeMo: Towards Language Models with Associative Memory Mechanisms

arXiv.org Artificial Intelligence

Memorization is a fundamental ability of Transformer-based Large Language Models, achieved through learning. In this paper, we propose a paradigm shift by designing an architecture to memorize text directly, bearing in mind the principle that memorization precedes learning. We introduce MeMo, a novel architecture for language modeling that explicitly memorizes sequences of tokens in layered associative memories. By design, MeMo offers transparency and the possibility of model editing, including forgetting texts. We experimented with the MeMo architecture, showing the memorization power of the one-layer and the multi-layer configurations.


In-context denoising with one-layer transformers: connections between attention and associative memory retrieval

arXiv.org Artificial Intelligence

We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative memory and attention mechanisms first identified by Ramsauer et al., and demonstrates the relevance of associative memory models in the study of in-context learning.


ISAM-MTL: Cross-subject multi-task learning model with identifiable spikes and associative memory networks

arXiv.org Artificial Intelligence

Cross-subject variability in EEG degrades performance of current deep learning models, limiting the development of brain-computer interface (BCI). This paper proposes ISAM-MTL, which is a multi-task learning (MTL) EEG classification model based on identifiable spiking (IS) representations and associative memory (AM) networks. The proposed model treats EEG classification of each subject as an independent task and leverages cross-subject data training to facilitate feature sharing across subjects. ISAM-MTL consists of a spiking feature extractor that captures shared features across subjects and a subject-specific bidirectional associative memory network that is trained by Hebbian learning for efficient and fast within-subject EEG classification. ISAM-MTL integrates learned spiking neural representations with bidirectional associative memory for cross-subject EEG classification. The model employs label-guided variational inference to construct identifiable spike representations, enhancing classification accuracy. Experimental results on two BCI Competition datasets demonstrate that ISAM-MTL improves the average accuracy of cross-subject EEG classification while reducing performance variability among subjects. The model further exhibits the characteristics of few-shot learning and identifiable neural activity beneath EEG, enabling rapid and interpretable calibration for BCI systems.


Review for NeurIPS paper: PyGlove: Symbolic Programming for Automated Machine Learning

Neural Information Processing Systems

Summary and Contributions: The paper introduces an AutoML library that tries to find its own sweet spot in the large ecosystem of newly minted AutoML libraries. The paper introduces a symbolic frontend to build neural network models, with simple fundamental constructs that provide choice insertions. Unlike all other packages that I have seen and reviewed, such as Keras Tuner, NNI, AutoGluon, Optuna (btw reference missing to Optuna, you should consider adding), this paper introduces something innovative and elegant. All these other packages consistently suffer from the code of the model definition getting ugly and unweildy really quickly when you have to introduce model structure searches, and when there's interaction between structure searches and size searches. In this paper, the authors cleanly separate model structure definitions from each layer's hyperparameter choices.


Review for NeurIPS paper: PyGlove: Symbolic Programming for Automated Machine Learning

Neural Information Processing Systems

The reviewers generally agree that the design choices of this framework for AutoML are judicious and hit a "sweet spot". This combination of language/tooling design is of great value to expose to large swathes of the NeurIPS community. The rebuttal persuasively addresses the reviewers' concerns about the evaluation and utility of this proposal, and the response to R4 is also reassuring. We look forward to the authors' final version of the paper, incorporating the proposed improvements.


Reviews: Dense Associative Memory for Pattern Recognition

Neural Information Processing Systems

The theoretical contribution presented in 291--310 is a welcome insight on the computational power of ReLUs. The experimental results for rectified polynomial units reported in figures 2 and 3 are interesting and apparently novel, even in the context of standard feedforward multi-layer networks. Being 291--297 a central point of the paper it should be expanded and better justified. Furthermore, the simple capacity analysis developed in p. 3 for the polynomial energy function is invoked here for the rectified polynomial energy function. This has to be justified. The paper starts from and mostly focuses on the associative memory (Hamiltonian) formulation, but then the findings are restricted to one-step retrieval.