S7: Selective and Simplified State Space Layers for Sequence Modeling

Soydan, Taylan, Zubić, Nikola, Messikommer, Nico, Mishra, Siddhartha, Scaramuzza, Davide

arXiv.org Artificial Intelligence 

A central challenge in sequence modeling is efficiently handling tasks with extended contexts. While recent state-space models (SSMs) have made significant progress in this area, they often lack input-dependent filtering or require substantial increases in model complexity to handle input variability. We address this gap by introducing S7, a simplified yet powerful SSM that can handle input dependence while incorporating stable reparameterization and specific design choices to dynamically adjust state transitions based on input content, maintaining efficiency and performance. We prove that this reparameterization ensures stability in long-sequence modeling by keeping state transitions well-behaved over time. Additionally, it controls the gradient norm, enabling efficient training and preventing issues like exploding or vanishing gradients. S7 significantly outperforms baselines across various sequence modeling tasks, including neuromorphic eventbased datasets, Long Range Arena benchmarks, and various physical and biological time series. Overall, S7 offers a more straightforward approach to sequence modeling without relying on complex, domain-specific inductive biases, achieving significant improvements across key benchmarks. Sequence modeling is a fundamental challenge in deep learning, with applications spanning natural language processing, computer vision, audio processing, and genomics (Sutskever et al., 2014; Graves et al., 2013). The core problem lies in effectively capturing and utilizing information from long input sequences while maintaining computational efficiency. While efficient, convolutional models (Bai et al., 2018) cannot often capture global context. The key challenge is to design a model that can (1) efficiently process very long sequences, (2) adaptively filter and retain relevant information over extended time horizons, (3) perform contentbased reasoning, and (4) maintain a compact state representation. Recent advances in Deep State Space Models (Deep SSMs) (Gu et al., 2020; Hasani et al., 2020) have shown promise, but existing approaches like S4 (Gu et al., 2022a) and Mamba (Gu & Dao, 2023) still face limitations in balancing these requirements.