Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
–Neural Information Processing Systems
IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient model & task scaling. We conduct extensive empirical studies and reveal the following key insights: 1) performing gradient descent updates by alternating on diverse modalities, loss functions, and tasks, with varying input resolutions, efficiently improves the model.
Neural Information Processing Systems
Apr-30-2026, 09:16:09 GMT
- Genre:
- Research Report (0.46)
- Industry:
- Health & Medicine > Therapeutic Area > Neurology (0.46)
- Technology: