DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets Supplementary Material

Neural Information Processing Systems 

Chen et al. [2022] proves that in such a binary-classification problem, an MoE layer converges to an Proof: We will prove this by contradiction. Thus, we show that vanilla MoE does not guarantee convergence with mixture of datasets.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found