Goto

Collaborating Authors

 conditional moe



Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

Neural Information Processing Systems

To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account. By incorporating the proposed Conditional MoEs, the recently proposed generalist model Uni-Perceiver can effectively mitigate the interference across tasks and modalities, and achieves state-of-the-art results on a series of downstream tasks via prompt tuning on 1% of downstream data. Moreover, the introduction of Conditional MoEs still holds the generalization ability of generalist models to conduct zero-shot inference on new tasks, e.g., videotext retrieval and video caption.


11fc8c98b46d4cbdfe8157267228f7d7-Supplemental-Conference.pdf

Neural Information Processing Systems

Table 6: Uni-Perceiver model variants used in this paper. Uni-Perceiver-B and Uni-Perceiver-L have the same architectures as their corresponding ViT variants, respectively. There are some setting changes to improve the training stability of the original Uni-Perceiver. The loss weights are adjusted to meet reasonable optimizations for all tasks by observing the early training losses through short-epoch experiments. Based on the above settings, we can train Uni-Perceiver more efficiently.



Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs

Neural Information Processing Systems

To build an artificial neural network like the biological intelligence system, recent works have unified numerous tasks into a generalist model, which can process various tasks with shared parameters and do not have any task-specific modules. While generalist models achieve promising results on various benchmarks, they have performance degradation on some tasks compared with task-specialized models. In this work, we find that interference among different tasks and modalities is the main factor to this phenomenon. To mitigate such interference, we introduce the Conditional Mixture-of-Experts (Conditional MoEs) to generalist models. Routing strategies under different levels of conditions are proposed to take both the training/inference cost and generalization ability into account.