DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets Supplementary Material
–Neural Information Processing Systems
Chen et al. [2022] proves that in such a binary-classification problem, an MoE layer converges to an Proof: We will prove this by contradiction. Thus, we show that vanilla MoE does not guarantee convergence with mixture of datasets.
Neural Information Processing Systems
Feb-17-2026, 11:37:06 GMT
- Technology: