MoxE: Mixture of xLSTM Experts with Entropy-Aware Routing for Efficient Language Modeling

Thiombiano, Abdoul Majid O., Hnich, Brahim, Mrad, Ali Ben, Mkaouer, Mohamed Wiem

May-6-2025–arXiv.org Artificial Intelligence

However, the quadratic complexityO ( n 2) of the attention mechanism (wheren is the sequence length) makes it computationally expensive to train and deploy large models, particularly for long sequences. This inherent limitation poses significant challenges for scalability and efficiency in real-world applications. One highly effective technique widely adopted to mitigate these challenges in training and deploying such massive models is the Mixture of Experts (MoE) framework [5, 11]. By design, in a MoE architecture, at inference time, the model intelligently utilizes only a sparse subset of its total parameters to process each input, leading to a dramatic reduction in the computational requirements at runtime and enabling more efficient scaling. The sparse MoE approach has been successfully applied to various models, demonstrating significant improvements in efficiency while maintaining or even enhancing performance [2]. Traditional Long Short-Term Memory (LSTM) networks, while demonstrably powerful in sequence modeling, inherently struggle with effectively managing long-term dependencies and achieving efficient associative recall, particularly when dealing with extended sequences. The Extended Long Short-Term Memory (xLSTM) architecture [1] directly addresses these fundamental limitations by introducing novel memory structures and optimized computation approaches within the LSTM unit itself.

artificial intelligence, deep learning, machine learning, (20 more...)

arXiv.org Artificial Intelligence

May-6-2025

arXiv.org PDF

Add feedback

Country:
- Africa > Middle East
  - Tunisia (0.04)
- Asia > Middle East
  - Jordan (0.04)
  - Saudi Arabia > Al-Qassim Province
    - Buraydah (0.04)
- North America > United States
  - Michigan > Genesee County > Flint (0.14)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found