Associative Transformer

Sun, Yuwei, Ochiai, Hideya, Wu, Zhirong, Lin, Stephen, Kanai, Ryota

Jan-30-2024–arXiv.org Artificial Intelligence

Sparse knowledge association can find resonance with the neuroscientific grounding of the Global Emerging from the pairwise attention in conventional Workspace Theory (GWT) (Baars, 1988; Dehaene et al., Transformers, there is a growing interest 1998; VanRullen & Kanai, 2020; Juliani et al., 2022). GWT in sparse attention mechanisms that align explains a fundamental cognitive architecture for working more closely with localized, contextual learning memory in the brain where diverse specialized modules compete in the biological brain. Existing studies such as to write information into a shared workspace through the Coordination method employ iterative crossattention a communication bottleneck. The bottleneck facilitates the mechanisms with a bottleneck to enable processing of content-addressable information using attention the sparse association of inputs. However, guided by contents in the shared workspace (Awh et al., these methods are parameter inefficient and fail 2006; Gazzaley & Nobre, 2012). in more complex relational reasoning tasks. To this end, we propose Associative Transformer A bottleneck guides models to generalize in a manner consistent (AiT) to enhance the association among sparsely with the underlying data distribution through inductive attended input patches, improving parameter efficiency biases of sparsity (Baxter, 2000; Goyal & Bengio, 2022), and performance in relational reasoning resulting in superior performance in tasks such as relational tasks.

explicit memory, hopfield network, transformer, (15 more...)

arXiv.org Artificial Intelligence

Jan-30-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Japan
  - Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)

Genre:
- Research Report > New Finding (0.68)

Technology:
- Information Technology > Artificial Intelligence
  - Cognitive Science (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)
  - Natural Language (1.00)