ResMoE: Space-efficient Compression of Mixture of Experts LLMs via Residual Restoration
Ai, Mengting, Wei, Tianxin, Chen, Yifan, Zeng, Zhichen, Zhao, Ritchie, Varatkar, Girish, Rouhani, Bita Darvish, Tang, Xianfeng, Tong, Hanghang, He, Jingrui
–arXiv.org Artificial Intelligence
Mixture-of-Experts (MoE) Transformer, the backbone architecture The profound impact of the Transformer architecture in the domain of multiple phenomenal language models, leverages sparsity of machine learning is undeniable, for the fields including by activating only a fraction of model parameters for each input natural language processing [3, 14, 18, 45, 48, 61] and computer token. The sparse structure, while allowing constant time costs, vision [17, 39, 64], to name a few. To further improve the capabilities results in space inefficiency: we still need to load all the model of pre-trained large language models (LLMs), one general parameters during inference. We introduce ResMoE, an innovative strategy is to scale up their parameters. Mixture-of-Experts (MoE) MoE approximation framework that utilizes Wasserstein barycenter [52] extends the traditional feedforward neural network (FFN) layer to extract a common expert (barycenter expert) and approximate by replacing a single multilayer perceptron (MLP) with multiple the residuals between this barycenter expert and the original ones. MLPs, referred to as "experts". While enhancing the performance, ResMoE enhances the space efficiency for inference of large-scale sparse MoE keeps computing costs (FLOPs) comparable to the original MoE Transformers in a one-shot and data-agnostic manner without dense model, as only a few selected experts will be activated retraining while maintaining minimal accuracy loss, thereby each time. The framework of an MoE layer is demonstrated in paving the way for broader accessibility to large language models.
arXiv.org Artificial Intelligence
Mar-9-2025
- Country:
- Europe (1.00)
- North America > United States
- California > Santa Clara County (0.14)
- Genre:
- Research Report > New Finding (0.67)
- Industry:
- Food & Agriculture (0.46)
- Government (0.46)
- Technology: