Foundation Model is Efficient Multimodal Multitask Model Selector
Meng, Fanqing, Shao, Wenqi, Peng, Zhanglin, Jiang, Chonghe, Zhang, Kaipeng, Qiao, Yu, Luo, Ping
–arXiv.org Artificial Intelligence
This paper investigates an under-explored but important problem: given a collection of pre-trained neural networks, predicting their performance on each multi-modal task without fine-tuning them, such as image recognition, referring, captioning, visual question answering, and text question answering. A brute-force approach is to finetune all models on all target datasets, bringing high computational costs. Although recent-advanced approaches employed lightweight metrics to measure models' transferability,they often depend heavily on the prior knowledge of a single task, making them inapplicable in a multi-modal multi-task scenario. To tackle this issue, we propose an efficient multi-task model selector (EMMS), which employs large-scale foundation models to transform diverse label formats such as categories, texts, and bounding boxes of different downstream tasks into a unified noisy label embedding. EMMS can estimate a model's transferability through a simple weighted linear regression, which can be efficiently solved by an alternating minimization algorithm with a convergence guarantee. Extensive experiments on 5 downstream tasks with 24 datasets show that EMMS is fast, effective, and generic enough to assess the transferability of pre-trained models, making it the first model selection method in the multi-task scenario. For instance, compared with the state-of-the-art method LogME enhanced by our label embeddings, EMMS achieves 9.0\%, 26.3\%, 20.1\%, 54.8\%, 12.2\% performance gain on image recognition, referring, captioning, visual question answering, and text question answering, while bringing 5.13x, 6.29x, 3.59x, 6.19x, and 5.66x speedup in wall-clock time, respectively. The code is available at https://github.com/OpenGVLab/Multitask-Model-Selector.
arXiv.org Artificial Intelligence
Aug-11-2023
- Country:
- North America > United States
- New York (0.04)
- Nevada > Clark County
- Las Vegas (0.04)
- Europe
- Poland (0.04)
- Switzerland > Zürich
- Zürich (0.14)
- Romania > Sud - Muntenia Development Region
- Giurgiu County > Giurgiu (0.04)
- Netherlands > North Holland
- Amsterdam (0.04)
- Asia
- Middle East > Israel
- Tel Aviv District > Tel Aviv (0.04)
- China
- Middle East > Israel
- North America > United States
- Genre:
- Research Report (1.00)
- Technology: