Your Transformer May Not be as Powerful as You Expect

Apr-24-2026, 22:26:22 GMT–Neural Information Processing Systems

Relative Positional Encoding (RPE), which encodes the relative distance between any pair of tokens, is one of the most successful modifications to the original Transformer. As far as we know, theoretical understanding of the RPE-based Transformers is largely unexplored. In this work, we mathematically analyze the power of RPE-based Transformers regarding whether the model is capable of approximating any continuous sequence-to-sequence functions. One may naturally assume the answer is in the affirmative--RPE-based Transformers are universal function approximators. However, we present a negative result by showing there exist continuous sequence-to-sequence functions that RPE-based Transformers cannot approximate no matter how deep and wide the neural network is.

artificial intelligence, machine learning, natural language, (16 more...)

Neural Information Processing Systems

Apr-24-2026, 22:26:22 GMT

Conferences PDF

Add feedback

Country:
- North America > United States (0.46)

Genre:
- Research Report (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Natural Language (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (0.69)

Duplicate Docs Excel Report

Title
YourTransformerMayNotbeasPowerful asYouExpect

Similar Docs Excel Report more

Title	Similarity	Source
None found