Transformers are Universal Predictors
Basu, Sourya, Choraria, Moulik, Varshney, Lav R.
–arXiv.org Artificial Intelligence
In We find limits to the Transformer architecture for this sense, the Transformer architecture is said to have a universal language modeling and show it has a universal computation property (Lu et al., 2021), reminiscent prediction property in an information-theoretic of predictive coding hypotheses of the brain that posit one sense. We further analyze performance in nonasymptotic basic operation in neurobiological information processing data regimes to understand the role (Golkar et al., 2022). of various components of the Transformer architecture, The basic predictive workings of Transformers and previous especially in the context of data-efficient findings of universal approximation and computation training. We validate our theoretical analysis with properties motivate us to ask whether they also have a universal experiments on both synthetic and real datasets.
arXiv.org Artificial Intelligence
Jul-15-2023