BAM! Just Like That: Simple and Efficient Parameter Upcycling for Mixture of Experts

Apr-28-2026, 17:25:40 GMT–Neural Information Processing Systems

Mixture of Experts (MoE) framework has become a popular architecture for large language models due to its superior performance compared to dense models. However, training MoEs from scratch in a large-scale regime is prohibitively expensive.

artificial intelligence, natural language, proceedings, (7 more...)

Neural Information Processing Systems

Apr-28-2026, 17:25:40 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Natural Language (0.59)