On the Expressivity of Selective State-Space Layers: A Multivariate Polynomial Approach

Cohen-Karlik, Edo, Zimerman, Itamar, Galanti, Liane, Atad, Ido, Globerson, Amir, Wolf, Lior

Feb-4-2025–arXiv.org Artificial Intelligence

Recent advances in efficient sequence modeling have introduced selective state-space layers, a key component of the Mamba architecture, which have demonstrated remarkable success in a wide range of NLP and vision tasks. While Mamba's empirical performance has matched or surpassed SoTA transformers on such diverse benchmarks, the theoretical foundations underlying its powerful representational capabilities remain less explored. In this work, we investigate the expressivity of selective state-space layers using multivariate polynomials, and prove that they surpass linear transformers in expressiveness. Consequently, our findings reveal that Mamba offers superior representational power over linear attention-based models for long sequences, while not sacrificing their generalization. Our theoretical insights are validated by a comprehensive set of empirical experiments on various datasets.

large language model, machine learning, polynomial, (17 more...)

arXiv.org Artificial Intelligence

Feb-4-2025

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East
  - Israel (0.14)
- North America > United States (0.28)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning > Neural Networks
    - Deep Learning (0.92)
  - Natural Language > Large Language Model (0.67)
  - Representation & Reasoning (1.00)