Multi-Modal Few-Shot Object Detection with Meta-Learning-Based Cross-Modal Prompting

Han, Guangxing, Chen, Long, Ma, Jiawei, Huang, Shiyuan, Chellappa, Rama, Chang, Shih-Fu

arXiv.org Artificial Intelligence 

Noname manuscript No. (will be inserted by the editor) Abstract We study multi-modal few-shot object detection novel classes present in few-shot visual examples, which are (FSOD) in this paper, using both few-shot visual examples then used to learn the text classifier. Knowledge distillation and class semantic information for detection, which are is introduced to learn the soft prompt generator without using complementary to each other by definition. Most of the previous human prior knowledge of class names, which may not works on multi-modal FSOD are fine-tuning-based be available for rare classes. Our insight is that the few-shot which are inefficient for online applications. Moreover, support images naturally include related context information these methods usually require expertise like class names to and semantics of the class. We comprehensively evaluate the extract class semantic embedding, which are hard to get proposed multi-modal FSOD models on multiple few-shot for rare classes. Our approach is motivated by the highlevel object detection benchmarks, achieving promising results. Specifically, we combine the few-shot visual classifier and text classifier learned via meta-learning and 1 Introduction prompt-based learning respectively to build the multi-modal classifier and detection models. In addition, to fully exploit Object detection is one of the most fundamental tasks the pre-trained language models, we propose meta-learningbased in computer vision. Recently, deep learning-based methods cross-modal prompting to generate soft prompts for [39, 38, 32, 3] have achieved great progress in this field.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found