modern hopfield model
- South America > Brazil (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- Europe > Austria > Vienna (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- (4 more...)
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- (4 more...)
Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes
We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories.We present a tight analysis by establishing a connection between the memory configuration of KHMs and spherical codes from information theory. Specifically, we treat the stored memory set as a specialized spherical code.This enables us to cast the memorization problem in KHMs into a point arrangement problem on a hypersphere.We show that the optimal capacity of KHMs occurs when the feature space allows memories to form an optimal spherical code.This unique perspective leads to: 1. An analysis of how KHMs achieve optimal memory capacity, and identify corresponding necessary conditions. Importantly, we establish an upper capacity bound that matches the well-known exponential lower bound in the literature. This provides the first tight and optimal asymptotic memory capacity for modern Hopfield models.2. A sub-linear time algorithm $\mathtt{U}\text{-}\mathtt{Hop}$+ to reach KHMs' optimal capacity. 3. An analysis of the scaling behavior of the required feature dimension relative to the number of stored memories.These efforts improve both the retrieval capability of KHMs and the representation learning of corresponding transformers.Experimentally, we provide thorough numerical results to back up theoretical findings.
- South America > Brazil (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- Europe > Austria > Vienna (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- (4 more...)
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- (4 more...)
Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes
We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories.We present a tight analysis by establishing a connection between the memory configuration of KHMs and spherical codes from information theory. Specifically, we treat the stored memory set as a specialized spherical code.This enables us to cast the memorization problem in KHMs into a point arrangement problem on a hypersphere.We show that the optimal capacity of KHMs occurs when the feature space allows memories to form an optimal spherical code.This unique perspective leads to: 1. An analysis of how KHMs achieve optimal memory capacity, and identify corresponding necessary conditions. Importantly, we establish an upper capacity bound that matches the well-known exponential lower bound in the literature. This provides the first tight and optimal asymptotic memory capacity for modern Hopfield models.2.
Review for NeurIPS paper: Modern Hopfield Networks and Attention for Immune Repertoire Classification
Weaknesses: This manuscript contains a highly theoretical analysis of modern Hopfield networks and their relationship to the attention mechanism of a transformer model. It also contains a deep model that addresses the machine learning task of immune repertoire classification. The major issue with this submission is that the connection between the two topics addressed in this paper, (i) classification of immune repertoires, and (ii) equivalence of the update rule of modern Hopfield networks and the attention mechanism of the transformer, is at best unclear. It feels as though two distinct papers have been condensed into one. Overall, combining these two results into one paper results in a main text manuscript that does not provide sufficient detail about either.
Provably Optimal Memory Capacity for Modern Hopfield Models: Transformer-Compatible Dense Associative Memories as Spherical Codes
Hu, Jerry Yao-Chieh, Wu, Dennis, Liu, Han
We study the optimal memorization capacity of modern Hopfield models and Kernelized Hopfield Models (KHMs), a transformer-compatible class of Dense Associative Memories. We present a tight analysis by establishing a connection between the memory configuration of KHMs and spherical codes from information theory. Specifically, we treat the stored memory set as a specialized spherical code. This enables us to cast the memorization problem in KHMs into a point arrangement problem on a hypersphere. We show that the optimal capacity of KHMs occurs when the feature space allows memories to form an optimal spherical code. This unique perspective leads to: (i) An analysis of how KHMs achieve optimal memory capacity, and identify corresponding necessary conditions. Importantly, we establish an upper capacity bound that matches the well-known exponential lower bound in the literature. This provides the first tight and optimal asymptotic memory capacity for modern Hopfield models. (ii) A sub-linear time algorithm $\mathtt{U}\text{-}\mathtt{Hop}$+ to reach KHMs' optimal capacity. (iii) An analysis of the scaling behavior of the required feature dimension relative to the number of stored memories. These efforts improve both the retrieval capability of KHMs and the representation learning of corresponding transformers. Experimentally, we provide thorough numerical results to back up theoretical findings.
- South America > Brazil (0.04)
- North America > United States > Illinois > Cook County > Evanston (0.04)
- Europe > Austria > Vienna (0.04)
- Asia > Middle East > Jordan (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
- (3 more...)