Y our Transformer May Not be as Powerful as You Expect Shengjie Luo

Neural Information Processing Systems 

To overcome the problem and make the model more powerful, we first present sufficient conditions for RPE-based Transformers to achieve universal function approximation.