Associative Transformer
Sun, Yuwei, Ochiai, Hideya, Wu, Zhirong, Lin, Stephen, Kanai, Ryota
–arXiv.org Artificial Intelligence
Sparse knowledge association can find resonance with the neuroscientific grounding of the Global Emerging from the pairwise attention in conventional Workspace Theory (GWT) (Baars, 1988; Dehaene et al., Transformers, there is a growing interest 1998; VanRullen & Kanai, 2020; Juliani et al., 2022). GWT in sparse attention mechanisms that align explains a fundamental cognitive architecture for working more closely with localized, contextual learning memory in the brain where diverse specialized modules compete in the biological brain. Existing studies such as to write information into a shared workspace through the Coordination method employ iterative crossattention a communication bottleneck. The bottleneck facilitates the mechanisms with a bottleneck to enable processing of content-addressable information using attention the sparse association of inputs. However, guided by contents in the shared workspace (Awh et al., these methods are parameter inefficient and fail 2006; Gazzaley & Nobre, 2012). in more complex relational reasoning tasks. To this end, we propose Associative Transformer A bottleneck guides models to generalize in a manner consistent (AiT) to enhance the association among sparsely with the underlying data distribution through inductive attended input patches, improving parameter efficiency biases of sparsity (Baxter, 2000; Goyal & Bengio, 2022), and performance in relational reasoning resulting in superior performance in tasks such as relational tasks.
arXiv.org Artificial Intelligence
Jan-30-2024
- Country:
- Asia > Japan
- Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Europe > United Kingdom
- England > Cambridgeshire > Cambridge (0.04)
- North America (0.14)
- Asia > Japan
- Genre:
- Research Report > New Finding (0.68)
- Technology: