On the Expressivity of Selective State-Space Layers: A Multivariate Polynomial Approach
Cohen-Karlik, Edo, Zimerman, Itamar, Galanti, Liane, Atad, Ido, Globerson, Amir, Wolf, Lior
–arXiv.org Artificial Intelligence
Recent advances in efficient sequence modeling have introduced selective state-space layers, a key component of the Mamba architecture, which have demonstrated remarkable success in a wide range of NLP and vision tasks. While Mamba's empirical performance has matched or surpassed SoTA transformers on such diverse benchmarks, the theoretical foundations underlying its powerful representational capabilities remain less explored. In this work, we investigate the expressivity of selective state-space layers using multivariate polynomials, and prove that they surpass linear transformers in expressiveness. Consequently, our findings reveal that Mamba offers superior representational power over linear attention-based models for long sequences, while not sacrificing their generalization. Our theoretical insights are validated by a comprehensive set of empirical experiments on various datasets.
arXiv.org Artificial Intelligence
Feb-4-2025
- Country:
- Asia > Middle East
- Israel (0.14)
- North America > United States (0.28)
- Asia > Middle East
- Genre:
- Research Report (1.00)
- Technology: