OTCE: Hybrid SSM and Attention with Cross Domain Mixture of Experts to construct Observer-Thinker-Conceiver-Expresser

Shi, Jingze, Xie, Ting, Wu, Bingheng, Zheng, Chunjun, Wang, Kai

Jun-24-2024–arXiv.org Artificial Intelligence

The Transformers (Attention is All You Need (Vaswani et al. 2017)) architecture is popular in modern deep learning language modeling, which can directly capture the relationship between any two elements in a sequence, effectively handle long-distance dependencies, however, the architecture has two main drawbacks. First, when processing long sequences, its self-attention mechanism's quadratic complexity and cache size limit the ability to handle long contexts. Second, Transformer lacks a single summary state, which means that each generated token must compute over the entire context. Meanwhile, the Selective State Model (Mamba (Gu and Dao 2023)) has emerged. Mamba achieves linear scaling of sequence length during training and maintains a constant state size during generation through its selective state update mechanism.

information, matrix, positional information, (15 more...)

arXiv.org Artificial Intelligence

Jun-24-2024

arXiv.org PDF

Add feedback

Country:
- Europe
  - Spain > Catalonia
    - Barcelona Province > Barcelona (0.04)
  - Italy > Calabria
    - Catanzaro Province > Catanzaro (0.04)
- Asia > China
  - Liaoning Province > Dalian (0.04)

Genre:
- Research Report (0.50)

Industry:
- Education > Curriculum > Subject-Specific Education (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found