Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
–Neural Information Processing Systems
IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient model & task scaling. We conduct extensive empirical studies and reveal the following key insights: 1) performing gradient descent updates by alternating on diverse modalities, loss functions, and tasks, with varying input resolutions, efficiently improves the model.
alternating gradient descent and mixture-of-expert, integrated multimodal perception, name change, (4 more...)
Neural Information Processing Systems
Dec-27-2025, 06:38:01 GMT
- Technology: