DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets Supplementary Material Anonymous Author(s) Affiliation Address email

Apr-30-2026, 00:06:07 GMT–Neural Information Processing Systems

Here we provide theoretical evidence that vanilla MoE do not6 guarantee convergence when mixing multiple datasets. Consider a binary classification problem over P-patch inputs where each8 patch has d dimensions and label y = { 1}. Thus, a labeled data point (x,y) has input x =9 (x(1),x(2),x(3),...,x(P)) (Rd)P is a collection of P patch inputs with y as the data label. The10 data x is generated from K clusters.11 Chen et al. [2022] proves that in such a binary-classification problem, an MoE layer converges to an12 o(1) test loss and zero training loss.

artificial intelligence, machine learning, mixture-of-dataset supplementary material anonymous author, (11 more...)

Neural Information Processing Systems

Apr-30-2026, 00:06:07 GMT

Conferences PDF

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning (1.00)

Duplicate Docs Excel Report

Title
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets Supplementary Material

Similar Docs Excel Report more

Title	Similarity	Source
None found