cd-fsod
CDFormer: Cross-Domain Few-Shot Object Detection Transformer Against Feature Confusion
Meng, Boyuan, Zhang, Xiaohan, Li, Peilin, Wu, Zhe, Li, Yiming, Zhao, Wenkai, Yu, Beinan, Shen, Hui-Liang
The object-background confusion refers to the confusion between expected objects and background. As illustrated in Figure 1(a), in underwater scenes, the boundaries between the target object and the background are often ambiguous, leading to missed detections. The object-object confusion refers to the confusion between different classes of objects. As illustrated in Figure 1(b), the similarity between different classes results in false detections. In the field of CD-FSOD, CD-ViTO [8] represents the state-of-the-art work, which devises various fine-tuning modules and achieves significant performance improvements. To address object-background confusion, CD-ViTO re-weights manually selected background features and combines them with object features in a weighted sum. However, manually designed features lack adaptability when the target domain distribution differs [4], [24]. To address object-object confusion, CD-ViTO [8] enhances class distinction by directly adjusting the support class features.
NTIRE 2025 Challenge on Cross-Domain Few-Shot Object Detection: Methods and Results
Fu, Yuqian, Qiu, Xingyu, Ren, Bin, Fu, Yanwei, Timofte, Radu, Sebe, Nicu, Yang, Ming-Hsuan, Van Gool, Luc, Zhang, Kaijin, Nong, Qingpeng, Dong, Xiugang, Gao, Hong, Zhou, Xiangsheng, Pan, Jiancheng, Liu, Yanxing, He, Xiao, Li, Jiahao, Sun, Yuze, Huang, Xiaomeng, Zhang, Zhenyu, Ma, Ran, Liu, Yuhan, Zhuang, Zijian, Yi, Shuai, Zou, Yixiong, Hong, Lingyi, Chen, Mingxi, Li, Runze, Sheng, Xingdong, Zhang, Wenqiang, Chen, Weisen, Yan, Yongxin, Chen, Xinguo, Shao, Yuanjie, Zuo, Zhengrong, Sang, Nong, Wu, Hao, Sun, Haoran, Hu, Shuming, Zhang, Yan, Shi, Zhiguang, Zhang, Yu, Chen, Chao, Wang, Tao, Feng, Da, Zhuo, Linhai, Lin, Ziming, Huang, Yali, Me, Jie, Yang, Yiming, Guo, Mi, Jiu, Mingyuan, Xu, Mingliang, Xiong, Maomao, Zhang, Qunshu, Cao, Xinyu, Yang, Yuqing, Sheng, Dianmo, Zhao, Xuanpu, Li, Zhiyu, Ding, Xuyang, Li, Wenqian
Cross-Domain Few-Shot Object Detection (CD-FSOD) poses significant challenges to existing object detection and few-shot detection models when applied across domains. In conjunction with NTIRE 2025, we organized the 1st CD-FSOD Challenge, aiming to advance the performance of current object detectors on entirely novel target domains with only limited labeled data. The challenge attracted 152 registered participants, received submissions from 42 teams, and concluded with 13 teams making valid final submissions. Participants approached the task from diverse perspectives, proposing novel models that achieved new state-of-the-art (SOTA) results under both open-source and closed-source settings. In this report, we present an overview of the 1st NTIRE 2025 CD-FSOD Challenge, highlighting the proposed solutions and summarizing the results submitted by the participants.
Cross-Domain Few-Shot Object Detection via Enhanced Open-Set Object Detector
Fu, Yuqian, Wang, Yu, Pan, Yixuan, Huai, Lian, Qiu, Xingyu, Shangguan, Zeyu, Liu, Tong, Kong, Lingjie, Fu, Yanwei, Van Gool, Luc, Jiang, Xingqun
This paper addresses the challenge of cross-domain few-shot object detection (CD-FSOD), aiming to develop an accurate object detector for novel domains with minimal labeled examples. While transformer-based open-set detectors e.g., DE-ViT~\cite{zhang2023detect} have excelled in both open-vocabulary object detection and traditional few-shot object detection, detecting categories beyond those seen during training, we thus naturally raise two key questions: 1) can such open-set detection methods easily generalize to CD-FSOD? 2) If no, how to enhance the results of open-set methods when faced with significant domain gaps? To address the first question, we introduce several metrics to quantify domain variances and establish a new CD-FSOD benchmark with diverse domain metric values. Some State-Of-The-Art (SOTA) open-set object detection methods are evaluated on this benchmark, with evident performance degradation observed across out-of-domain datasets. This indicates the failure of adopting open-set detectors directly for CD-FSOD. Sequentially, to overcome the performance degradation issue and also to answer the second proposed question, we endeavor to enhance the vanilla DE-ViT. With several novel components including finetuning, a learnable prototype module, and a lightweight attention module, we present an improved Cross-Domain Vision Transformer for CD-FSOD (CD-ViTO). Experiments show that our CD-ViTO achieves impressive results on both out-of-domain and in-domain target datasets, establishing new SOTAs for both CD-FSOD and FSOD. All the datasets, codes, and models will be released to the community.