MEPT: Mixture of Expert Prompt Tuning as a Manifold Mapper

Zeng, Runjia, Sun, Guangyan, Wang, Qifan, Geng, Tong, Dianat, Sohail, Han, Xiaotian, Rao, Raghuveer, Zhang, Xueling, Han, Cheng, Huang, Lifu, Liu, Dongfang

Sep-16-2025–arXiv.org Artificial Intelligence

Considering deep neural networks as manifold mappers, the pretrain-then-fine-tune paradigm can be interpreted as a two-stage process: pretrain establishes a broad knowledge base, and fine-tune adjusts the model parameters to activate specific neural pathways to align with the target manifold. Although prior fine-tuning approaches demonstrate success, their rigid parameter space limits their ability to dynamically activate appropriate neural pathways, rendering them ill-equipped to adapt flexibly to the diverse and evolving data distributions. In light of this view, we propose a novel approach, Mixture of Expert Prompt Tuning (MEPT), as an effective and efficient manifold-mapping framework. MEPT leverages the Mixture of Experts architecture by integrating multiple prompt experts to adaptively learn diverse and non-stationary data distributions. Empirical evaluations demonstrate that MEPT outperforms several state-of-the-art parameter efficient baselines on SuperGLUE, achieving notable improvements in mean accuracy (e.g., 1.94%) while significantly reducing activated prompts by 79.25%. The effectiveness of MEPT is further supported by theoretical insights from manifold learning and validated through neural activation pathway visualization results. Our code is avaliable at https://runjia.tech/emnlp_mept/.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

Sep-16-2025

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.92)
- Asia (0.68)

Genre:
- Research Report
  - New Finding (0.93)
  - Promising Solution (0.66)

Industry:
- Government (0.67)
- Education (0.48)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Natural Language > Large Language Model (1.00)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)