BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts
–Neural Information Processing Systems
Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance compared to dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive.
Neural Information Processing Systems
Dec-26-2025, 06:06:58 GMT
- Technology: