Learning to Specialize: Joint Gating-Expert Training for Adaptive MoEs in Decentralized Settings

Jun-14-2026, 09:08:01 GMT–Neural Information Processing Systems

Mixture-of-Experts (MoEs) achieve scalability by dynamically activating subsets of their components. Yet, understanding how expertise emerges through joint training of gating mechanisms and experts remains incomplete, especially in scenarios without clear task partitions. Motivated by inference costs and data heterogeneity, we study how joint training of gating functions and experts can dynamically allocate domain-specific expertise across multiple underlying data distributions. As an outcome of our framework, we develop an instance tailored specifically to decentralized training scenarios, introducing Dynamically Decentralized Orchestration of MoEs or DDOME. DDOME leverages heterogeneity emerging from distributional shifts across decentralized data sources to specialize experts dynamically. By integrating a pretrained common expert to inform a gating function, DDOMEachieves personalized expert subset selection on-the-fly, facilitating just-in-time personalization.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Jun-14-2026, 09:08:01 GMT

Conferences PDF

Add feedback

Genre:
- Research Report
  - New Finding (1.00)
  - Experimental Study (1.00)

Industry:
- Health & Medicine > Therapeutic Area (0.46)
- Education > Educational Technology
  - Educational Software > Computer Based Training (0.34)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language (1.00)
  - Vision (0.93)
  - Machine Learning
    - Statistical Learning (0.67)
    - Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found