MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks

Mar-18-2026, 10:55:10 GMT–Neural Information Processing Systems

The sparsely activated mixture of experts (MoE) model presents an effective alternative to densely activated (dense) models, combining improved accuracy with computational efficiency. However, training MoE models from scratch requires extensive data and computational resources, a challenge that limits their widespread adoption. To address this, we introduce MoE Jetpack, a framework designed to fine-tune the abundant and easily accessible dense checkpoints into MoE models.

artificial intelligence, name change, proceedings, (6 more...)

Neural Information Processing Systems

Mar-18-2026, 10:55:10 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.38)