Routing Mamba: Scaling State Space Models with Mixture-of-Experts Projection

Jun-18-2026, 18:36:11 GMT–Neural Information Processing Systems

Recent advances, such as Mamba, further enhance SSMs with inputdependent gating and hardware-aware implementations, positioning them as strong alternatives to Transformers for long sequence modeling. However, efficiently scaling the expressive power of SSMs, particularly with Mixture of Experts (MoE), remains challenging, as naive integration attempts often falter or degrade performance. In this work, we introduce Routing Mamba (RoM), a novel approach that scales SSM parameters using sparse mixtures of linear projection experts.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Jun-18-2026, 18:36:11 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - Experimental Study (1.00)
  - New Finding (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language
    - Large Language Model (1.00)
    - Chatbot (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found