Strassen Attention, Split VC Dimension and Compositionality in Transformers
–Neural Information Processing Systems
We propose the first method to show theoretical limitations for one-layer softmax transformers with arbitrarily many precision bits (even infinite). We establish those limitations for three tasks that require advanced reasoning. The first task, Match 3 (Sanford et al., 2023), requires looking at all possible token triplets in an input sequence. The second and third tasks address compositionality-based reasoning: function composition (Peng et al., 2024) and binary relations composition, respectively. We formally prove the inability of one-layer softmax Transformers to solve any of these tasks.
Neural Information Processing Systems
Jun-10-2026, 07:26:03 GMT
- Technology: