Musketeer (All for One, and One for All): A Generalist Vision-Language Model with Task Explanation Prompts
Zhang, Zhaoyang, Shen, Yantao, Shi, Kunyu, Cai, Zhaowei, Fang, Jun, Deng, Siqi, Yang, Hao, Modolo, Davide, Tu, Zhuowen, Soatto, Stefano
–arXiv.org Artificial Intelligence
We present a sequence-to-sequence vision-language model whose parameters are jointly trained on all tasks (all for one) and fully shared among multiple tasks (one for all), resulting in a single model which we named Musketeer. The integration of knowledge across heterogeneous tasks is enabled by a novel feature called Task Explanation Prompt (TEP). TEP reduces interference among tasks, allowing the model to focus on their shared structure. With a single model, Musketeer achieves results comparable to or better than strong baselines trained on single tasks, almost uniformly across multiple tasks.
arXiv.org Artificial Intelligence
May-11-2023