Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings

Open in new window