Programming Languages
Scaling Laws for Associative Memories
Cabannes, Vivien, Dohmatob, Elvis, Bietti, Alberto
Learning arguably involves the discovery and memorization of abstract rules. The aim of this paper is to study associative memory mechanisms. Our model is based on high-dimensional matrices consisting of outer products of embeddings, which relates to the inner layers of transformer language models. We derive precise scaling laws with respect to sample size and parameter size, and discuss the statistical efficiency of different estimators, including optimization-based algorithms. We provide extensive numerical experiments to validate and interpret theoretical results, including fine-grained visualizations of the stored memory associations.
Memory in Plain Sight: A Survey of the Uncanny Resemblances between Diffusion Models and Associative Memories
Hoover, Benjamin, Strobelt, Hendrik, Krotov, Dmitry, Hoffman, Judy, Kira, Zsolt, Chau, Duen Horng
Diffusion Models (DMs) have recently set state-of-the-art on many generation benchmarks. However, there are myriad ways to describe them mathematically, which makes it difficult to develop a simple understanding of how they work. In this survey, we provide a concise overview of DMs from the perspective of dynamical systems and Ordinary Differential Equations (ODEs) which exposes a mathematical connection to the highly related yet often overlooked class of energy-based models, called Associative Memories (AMs). Energy-based AMs are a theoretical framework that behave much like denoising DMs, but they enable us to directly compute a Lyapunov energy function on which we can perform gradient descent to denoise data. We then summarize the 40 year history of energy-based AMs, beginning with the original Hopfield Network, and discuss new research directions for AMs and DMs that are revealed by characterizing the extent of their similarities and differences
Explainable AI Insights for Symbolic Computation: A case study on selecting the variable ordering for cylindrical algebraic decomposition
Pickering, Lynn, Almajano, Tereso Del Rio, England, Matthew, Cohen, Kelly
In recent years there has been increased use of machine learning (ML) techniques within mathematics, including symbolic computation where it may be applied safely to optimise or select algorithms. This paper explores whether using explainable AI (XAI) techniques on such ML models can offer new insight for symbolic computation, inspiring new implementations within computer algebra systems that do not directly call upon AI tools. We present a case study on the use of ML to select the variable ordering for cylindrical algebraic decomposition. It has already been demonstrated that ML can make the choice well, but here we show how the SHAP tool for explainability can be used to inform new heuristics of a size and complexity similar to those human-designed heuristics currently commonly used in symbolic computation.
Classification and Generation of real-world data with an Associative Memory Model
Simas, Rodrigo, Sa-Couto, Luis, Wichert, Andreas
Drawing from memory the face of a friend you have not seen in years is a difficult task. However, if you happen to cross paths, you would easily recognize each other. The biological memory is equipped with an impressive compression algorithm that can store the essential, and then infer the details to match perception. The Willshaw Memory is a simple abstract model for cortical computations which implements mechanisms of biological memories. Using our recently proposed sparse coding prescription for visual patterns, this model can store and retrieve an impressive amount of real-world data in a fault-tolerant manner. In this paper, we extend the capabilities of the basic Associative Memory Model by using a Multiple-Modality framework. In this setting, the memory stores several modalities (e.g., visual, or textual) of each pattern simultaneously. After training, the memory can be used to infer missing modalities when just a subset is perceived. Using a simple encoder-memory-decoder architecture, and a newly proposed iterative retrieval algorithm for the Willshaw Model, we perform experiments on the MNIST dataset. By storing both the images and labels as modalities, a single Memory can be used not only to retrieve and complete patterns but also to classify and generate new ones. We further discuss how this model could be used for other learning tasks, thus serving as a biologically-inspired framework for learning.
Improved Logical Reasoning of Language Models via Differentiable Symbolic Programming
Zhang, Hanlin, Huang, Jiani, Li, Ziyang, Naik, Mayur, Xing, Eric
Pre-trained large language models (LMs) struggle to perform logical reasoning reliably despite advances in scale and compositionality. In this work, we tackle this challenge through the lens of symbolic programming. We propose DSR-LM, a Differentiable Symbolic Reasoning framework where pre-trained LMs govern the perception of factual knowledge, and a symbolic module performs deductive reasoning. In contrast to works that rely on hand-crafted logic rules, our differentiable symbolic reasoning framework efficiently learns weighted rules and applies semantic loss to further improve LMs. DSR-LM is scalable, interpretable, and allows easy integration of prior knowledge, thereby supporting extensive symbolic programming to robustly derive a logical conclusion. The results of our experiments suggest that DSR-LM improves the logical reasoning abilities of pre-trained language models, resulting in a significant increase in accuracy of over 20% on deductive reasoning benchmarks. Furthermore, DSR-LM outperforms a variety of competitive baselines when faced with systematic changes in sequence length.
Performance Measures for Associative Memories that Learn and Forget
Recently, many modifications to the McCulloch/Pitts model have been proposed where both learning and forgetting occur. Given that the network never saturates (ceases to function effectively due to an overload of information), the learning updates can con(cid:173) tinue indefinitely. For these networks, we need to introduce performance measmes in addi(cid:173) tion to the information capacity to evaluate the different networks. We mathematically define quantities such as the plasticity of a network, the efficacy of an information vector, and the probability of network saturation. From these quantities we analytically compare different networks.
Capacity for Patterns and Sequences in Kanerva's SDM as Compared to Other Associative Memory Models
The information capacity of Kanerva's Sparse, Distributed Memory (SDM) and Hopfield-type neural networks is investigated. Under the approximations used here, it is shown that the to(cid:173) tal information stored in these systems is proportional to the number connections in the net(cid:173) work. The proportionality constant is the same for the SDM and HopJreld-type models in(cid:173) dependent of the particular model, or the order of the model. The approximations are checked numerically. This same analysis can be used to show that the SDM can store se(cid:173) quences of spatiotemporal patterns, and the addition of time-delayed connections allows the retrieval of context dependent temporal patterns.
The Capacity of the Kanerva Associative Memory is Exponential
It is shown by sphere packing arguments that as the address length increases. This exponential grovth in capacity can actually be achieved by the Kanerva associative memory. Formulas for these op.timal values are provided. The exponential grovth in capacity for the Kanerva associative memory contrasts sharply vith the sub-linear grovth in capacity for the Hopfield associative memory. Our model of an associative memory is the folloving.
Invariant Object Recognition Using a Distributed Associative Memory
This paper describes an approach to 2-dimensional object recognition. Complex-log con(cid:173) formal mapping is combined with a distributed associative memory to create a system which recognizes objects regardless of changes in rotation or scale. Recalled information from the memorized database is used to classify an object, reconstruct the memorized ver(cid:173) sion of the object, and estimate the magnitude of changes in scale or rotation. The system response is resistant to moderate amounts of noise and occlusion. Several experiments, us(cid:173) ing real, gray scale images, are presented to show the feasibility of our approach.
REFLEXIVE ASSOCIATIVE MEMORIES
In the synchronous discrete model, the average memory capacity of bidirectional associative memories (BAMs) is compared with that of Hopfield memories, by means of a calculat10n of the percentage of good recall for 100 random BAMs of dimension 64x64, for different numbers of stored vectors. The memory capac1ty Is found to be much smal1er than the Kosko upper bound, which Is the lesser of the two dimensions of the BAM. On the average, a 64x64 BAM has about 68 % of the capacity of the corresponding Hopfield memory with the same number of neurons. The memory capacity limitations are due to spurious stable states, which arise In BAMs In much the same way as in Hopfleld memories. Occurrence of spurious stable states can be avoided by replacing the thresholding in the backlayer of the BAM by another nonl1near process, here called "Dominant Label Selection" (DLS).