On the Computational Power of Decoder-Only Transformer Language Models
–arXiv.org Artificial Intelligence
This article presents a theoretical evaluation of the computational universality of decoder-only transformer models. We extend the theoretical literature on transformer models and show that decoder-only transformer architectures (even with only a single layer and single attention head) are Turing complete under reasonable assumptions. From the theoretical analysis, we show sparsity/compressibility of the word embedding to be a necessary condition for Turing completeness to hold.
arXiv.org Artificial Intelligence
May-30-2023
- Country:
- North America > United States
- Tennessee > Davidson County > Nashville (0.04)
- Europe > Spain
- Canary Islands > Gran Canaria > Las Palmas de Gran Canaria (0.04)
- North America > United States
- Genre:
- Research Report (0.40)
- Technology: