pfenet
SingularValueFine-tuning: Few-shotSegmentation requiresFew-parametersFine-tuning-SupplementaryMaterial
Different finetune strategy: In Figure 1, we visualize the mIoU curve of different fine-tuning strategies. It can be seen that both layer-based and convolution-based fine-tuning methods bring over-fitting problems. This result shows that traditional fine-tuning methods are not suitable for few-shot segmentation tasks. Directly fine-tuning theparameters ofbackbone infew-shot learning affects the robustness ofFSS models. Therefore, we propose anovelfine-tuning strategy,namely SVF.
Few-shot Semantic Segmentation with Self-supervision from Pseudo-classes
Li, Yiwen, Data, Gratianus Wesley Putra, Fu, Yunguan, Hu, Yipeng, Prisacariu, Victor Adrian
Despite the success of deep learning methods for semantic segmentation, few-shot semantic segmentation remains a challenging task due to the limited training data and the generalisation requirement for unseen classes. While recent progress has been particularly encouraging, we discover that existing methods tend to have poor performance in terms of meanIoU when query images contain other semantic classes besides the target class. To address this issue, we propose a novel self-supervised task that generates random pseudo-classes in the background of the query images, providing extra training data that would otherwise be unavailable when predicting individual target classes. To that end, we adopted superpixel segmentation for generating the pseudo-classes. With this extra supervision, we improved the meanIoU performance of the state-of-the-art method by 2.5% and 5.1% on the one-shot tasks, as well as 6.7% and 4.4% on the five-shot tasks, on the PASCAL-5i and COCO benchmarks, respectively.
Improved Few-shot Segmentation by Redefinition of the Roles of Multi-level CNN Features
Wang, Zhijie, Suganuma, Masanori, Okatani, Takayuki
This study is concerned with few-shot segmentation, i.e., segmenting the region of an unseen object class in a query image, given support image(s) of its instances. The current methods rely on the pretrained CNN features of the support and query images. The key to good performance depends on the proper fusion of their mid-level and high-level features; the former contains shape-oriented information, while the latter has class-oriented information. Current state-of-the-art methods follow the approach of Tian et al., which gives the mid-level features the primary role and the high-level features the secondary role. In this paper, we reinterpret this widely employed approach by redifining the roles of the multi-level features; we swap the primary and secondary roles. Specifically, we regard that the current methods improve the initial estimate generated from the high-level features using the mid-level features. This reinterpretation suggests a new application of the current methods: to apply the same network multiple times to iteratively update the estimate of the object's region, starting from its initial estimate. Our experiments show that this method is effective and has updated the previous state-of-the-art on COCO-20$^i$ in the 1-shot and 5-shot settings and on PASCAL-5$^i$ in the 1-shot setting.