A Closer Look at the CLS Token for Cross-Domain Few-Shot Learning

Mar-21-2026, 18:44:23 GMT–Neural Information Processing Systems

Vision Transformer (ViT) has shown great power in learning from large-scale datasets. However, collecting sufficient data for expert knowledge is always difficult. To handle this problem, Cross-Domain Few-Shot Learning (CDFSL) has been proposed to transfer the source-domain knowledge learned from sufficient data to target domains where only scarce data is available. In this paper, we find an intriguing phenomenon neglected by previous works for the CDFSL task based on ViT: leaving the CLS token to random initialization, instead of loading source-domain trained parameters, could consistently improve target-domain performance. We find the CLS token naturally absorbs domain information due to the inherent structure of the ViT, which is represented as the low-frequency component in the Fourier frequency space of images. Based on this phenomenon and interpretation, we further propose a method for the CDFSL task to decouple the domain information in the CLS token during the source-domain training, and adapt the CLS token on the target domain for efficient few-shot learning.

artificial intelligence, cl token, domain information, (6 more...)

Neural Information Processing Systems

Mar-21-2026, 18:44:23 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence (0.88)