Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception

Neural Information Processing Systems 

IMP makes use of a novel design that combines Alternating Gradient Descent (AGD) and Mixture-of-Experts (MoE) for efficient model & task scaling. We conduct extensive empirical studies and reveal the following key insights: 1) performing gradient descent updates by alternating on diverse modalities, loss functions, and tasks, with varying input resolutions, efficiently improves the model.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found