Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection
–Neural Information Processing Systems
Recent advances, such as Mamba, further enhance SSMs with inputdependent gating and hardware-aware implementations, positioning them as strong alternatives to Transformers for long sequence modeling. However, efficiently scaling the expressive power of SSMs, particularly with Mixture of Experts (MoE), remains challenging, as naive integration attempts often falter or degrade performance. In this work, we introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.
Neural Information Processing Systems
Jun-18-2026, 18:36:11 GMT