DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets Supplementary Material
–Neural Information Processing Systems
Chen et al. [2022] proves that in such a binary-classification problem, an MoE layer converges to an Proof: We will prove this by contradiction. Thus, we show that vanilla MoE does not guarantee convergence with mixture of datasets.
Neural Information Processing Systems
Oct-9-2025, 09:19:38 GMT
- Technology: