Multitask Prompt Tuning Enables Parameter-Efficient Transfer Learning

Wang, Zhen, Panda, Rameswar, Karlinsky, Leonid, Feris, Rogerio, Sun, Huan, Kim, Yoon

arXiv.org Artificial Intelligence 

Prompt tuning, in which a base pretrained model is adapted to each task via conditioning on learned prompt vectors, has emerged as a promising approach for efficiently adapting large language models to multiple downstream tasks. However, existing methods typically learn soft prompt vectors from scratch, and it has not been clear how to exploit the rich cross-task knowledge with prompt vectors in a multitask learning setting. We propose multitask prompt tuning (MPT), which first learns a single transferable prompt by distilling knowledge from multiple task-specific source prompts. We then learn multiplicative low rank updates to this shared prompt to efficiently adapt it to each downstream target task. Extensive experiments on 23 NLP datasets demonstrate that our proposed approach outperforms the state-of-the-art methods, including the full finetuning baseline in some cases, despite only tuning 0.035% as many task-specific parameters. Finetuning pretrained language models (PLMs) has led to significant improvements across various downstream NLP tasks (Devlin et al., 2019; Howard & Ruder, 2018; Raffel et al., 2020). However, the conventional paradigm of full task-specific finetuning (FT) is difficult to scale to multiple tasks, given that modern PLMs can have hundreds of millions (or even billions) of parameters. There thus has been a growing interest in developing parameter-efficient methods for model tuning (Houlsby et al., 2019; Lester et al., 2021; Ding et al., 2022), where the goal is to learn only a small number of additional parameters per task while achieving performance comparable to full finetuning. Work done during an internship at MIT-IBM Watson AI Lab. Figure 2: Parameter efficiency on GLUE (left) and SuperGLUE (right). Our multitask prompt tuning (MPT) approach, which transfers a single shared prompt learned from multiple source tasks using prompt decomposition and distillation, maintains high accuracy (y-axis) while finetuning only a small number of parameters per task (x-axis).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found