Theoretical Constraints on the Expressive Power of $\mathsf{RoPE}$-based Tensor Attention Transformers

Li, Xiaoyu, Liang, Yingyu, Shi, Zhenmei, Song, Zhao, Wan, Mingda

Dec-23-2024–arXiv.org Artificial Intelligence

Tensor Attention extends traditional attention mechanisms by capturing high-order correlations across multiple modalities, addressing the limitations of classical matrix-based attention. Meanwhile, Rotary Position Embedding ($\mathsf{RoPE}$) has shown superior performance in encoding positional information in long-context scenarios, significantly enhancing transformer models' expressiveness. Despite these empirical successes, the theoretical limitations of these technologies remain underexplored. In this study, we analyze the circuit complexity of Tensor Attention and $\mathsf{RoPE}$-based Tensor Attention, showing that with polynomial precision, constant-depth layers, and linear or sublinear hidden dimension, they cannot solve fixed membership problems or $(A_{F,r})^*$ closure problems, under the assumption that $\mathsf{TC}^0 \neq \mathsf{NC}^1$. These findings highlight a gap between the empirical performance and theoretical constraints of Tensor Attention and $\mathsf{RoPE}$-based Tensor Attention Transformers, offering insights that could guide the development of more theoretically grounded approaches to Transformer model design and scaling.

data mining, large language model, machine learning, (22 more...)

arXiv.org Artificial Intelligence

Dec-23-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States (1.00)

Genre:
- Research Report > New Finding (0.66)

Technology:
- Information Technology
  - Data Science > Data Mining (0.93)
  - Artificial Intelligence
    - Representation & Reasoning (1.00)
    - Natural Language
      - Large Language Model (1.00)
      - Chatbot (1.00)
    - Machine Learning
      - Statistical Learning (1.00)
      - Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found