MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
–Neural Information Processing Systems
The sparsely activated mixture of experts (MoE) model presents an effective alternative to densely activated (dense) models, combining improved accuracy with computational efficiency. However, training MoE models from scratch requires extensive data and computational resources, a challenge that limits their widespread adoption. To address this, we introduce MoE Jetpack, a framework designed to fine-tune the abundant and easily accessible dense checkpoints into MoE models.
Neural Information Processing Systems
Dec-24-2025, 02:43:44 GMT
- Technology: