hopfield
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- Africa > Mali (0.04)
- North America > United States > New York (0.04)
Self-Evidencing Through Hierarchical Gradient Decomposition: A Dissipative System That Maintains Non-Equilibrium Steady-State by Minimizing Variational Free Energy
The Free Energy Principle (FEP) states that self-organizing systems must minimize variational free energy to persist (Friston, 2010, 2019), but the path from principle to implementable algorithm has remained unclear. We present a constructive proof that the FEP can be realized through exact local credit assignment. The system decomposes gradient computation hierarchically: spatial credit via feedback alignment, temporal credit via eligibility traces, and structural credit via a Trophic Field Map (TFM) that estimates expected gradient magnitude for each connection block. We prove these mechanisms are exact at their respective levels and validate the central claim empirically: the TFM achieves 0.9693 Pearson correlation with oracle gradients. This exactness produces emergent capabilities including 98.6% retention after task interference, autonomous recovery from 75% structural damage, self-organized criticality (spectral radius ρ 1.0), and sample-efficient reinforcement learning on continuous control tasks without replay buffers. The architecture unifies Pri-gogine's dissipative structures (Prigogine, 1977), Fris-ton's free energy minimization (Friston, 2010), and Hopfield's attractor dynamics (Hopfield, 1982; Amit et al., 1985a,b), demonstrating that exact hierarchical inference over network topology can be implemented with local, biologically plausible rules.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California > San Mateo County > San Mateo (0.04)
- (3 more...)
- Research Report (0.50)
- Instructional Material > Course Syllabus & Notes (0.46)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Education (0.68)
In-Context Algorithm Emulation in Fixed-Weight Transformers
Hu, Jerry Yao-Chieh, Liu, Hude, Zhang, Jennifer Yuntong, Liu, Han
We prove that a minimal Transformer architecture with frozen weights is capable of emulating a broad class of algorithms by in-context prompting. In particular, for any algorithm implementable by a fixed-weight attention head (e.g. one-step gradient descent or linear/ridge regression), there exists a prompt that drives a two-layer softmax attention module to reproduce the algorithm's output with arbitrary precision. This guarantee extends even to a single-head attention layer (using longer prompts if necessary), achieving architectural minimality. Our key idea is to construct prompts that encode an algorithm's parameters into token representations, creating sharp dot-product gaps that force the softmax attention to follow the intended computation. This construction requires no feed-forward layers and no parameter updates. All adaptation happens through the prompt alone. These findings forge a direct link between in-context learning and algorithmic emulation, and offer a simple mechanism for large Transformers to serve as prompt-programmable libraries of algorithms. They illuminate how GPT-style foundation models may swap algorithms via prompts alone, establishing a form of algorithmic universality in modern Transformer models.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- North America > United States > Iowa > Story County > Ames (0.04)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.14)
- Africa > Mali (0.04)
- North America > United States > New York (0.04)
Effects of Feature Correlations on Associative Memory Capacity
Bielmeier, Stefan, Friedland, Gerald
We investigate how feature correlations influence the capacity of Dense Associative Memory (DAM), a Transformer attention-like model. Practical machine learning scenarios involve feature-correlated data and learn representations in the input space, but current capacity analyses do not account for this. We develop an empirical framework to analyze the effects of data structure on capacity dynamics. Specifically, we systematically construct datasets that vary in feature correlation and pattern separation using Hamming distance from information theory, and compute the model's corresponding storage capacity using a simple binary search algorithm. Our experiments confirm that memory capacity scales exponentially with increasing separation in the input space. Feature correlations do not alter this relationship fundamentally, but reduce capacity slightly at constant separation. This effect is amplified at higher polynomial degrees in the energy function, suggesting that Associative Memory is more limited in depicting higher-order interactions between features than patterns. Our findings bridge theoretical work and practical settings for DAM, and might inspire more data-centric methods.
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (0.88)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.88)
In-context denoising with one-layer transformers: connections between attention and associative memory retrieval
Smart, Matthew, Bietti, Alberto, Sengupta, Anirvan M.
We introduce in-context denoising, a task that refines the connection between attention-based architectures and dense associative memory (DAM) networks, also known as modern Hopfield networks. Using a Bayesian framework, we show theoretically and empirically that certain restricted denoising problems can be solved optimally even by a single-layer transformer. We demonstrate that a trained attention layer processes each denoising prompt by performing a single gradient descent update on a context-aware DAM energy landscape, where context tokens serve as associative memories and the query token acts as an initial state. This one-step update yields better solutions than exact retrieval of either a context token or a spurious local minimum, providing a concrete example of DAM networks extending beyond the standard retrieval paradigm. Overall, this work solidifies the link between associative memory and attention mechanisms first identified by Ramsauer et al., and demonstrates the relevance of associative memory models in the study of in-context learning.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Austria (0.04)
- Information Technology > Artificial Intelligence > Systems & Languages > Programming Languages (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- (2 more...)