GRAPHMOE: Amplifying Cognitive Depth of Mixture-of-Experts Network via Introducing Self-Rethinking Mechanism
Tang, Chen, Lv, Bo, Zheng, Zifan, Yang, Bohao, Zhao, Kun, Liao, Ning, Wang, Xiaoxing, Xiong, Feiyu, Li, Zhiyu, Liu, Nayu, Jiang, Jingchi
–arXiv.org Artificial Intelligence
Traditional Mixture-of-Experts (MoE) networks benefit from utilizing multiple smaller expert models as opposed to a single large network. However, these experts typically operate independently, leaving a question open about whether interconnecting these models could enhance the performance of MoE networks. In response, we introduce GRAPHMOE, a novel method aimed at augmenting the cognitive depth of language models via a self-rethinking mechanism constructed on Pseudo GraphMoE networks. GRAPHMOE employs a recurrent routing strategy to simulate iterative thinking steps, thereby facilitating the flow of information among expert nodes. We implement the GRAPHMOE architecture using Low-Rank Adaptation techniques (LoRA) and conduct extensive experiments on various benchmark datasets. The experimental results reveal that GRAPHMOE outperforms other LoRA based models, achieving state-of-the-art (SOTA) performance. Additionally, this study explores a novel recurrent routing strategy that may inspire further advancements in enhancing the reasoning capabilities of language models.
arXiv.org Artificial Intelligence
Jan-14-2025
- Country:
- South America > Colombia
- Meta Department > Villavicencio (0.04)
- North America
- United States (0.14)
- Canada > Ontario
- Toronto (0.04)
- Asia
- South America > Colombia
- Genre:
- Research Report
- Promising Solution (0.66)
- New Finding (0.46)
- Research Report
- Technology: