SegGen: Supercharging Segmentation Models with Text2Mask and Mask2Img Synthesis
Ye, Hanrong, Kuen, Jason, Liu, Qing, Lin, Zhe, Price, Brian, Xu, Dan
–arXiv.org Artificial Intelligence
Figure 1: Effectiveness of SegGen: Through training with synthetic data generated by the proposed SegGen, we significantly boost the performance of state-of-the-art segmentation model Mask2Former (Cheng et al., 2022) on evaluation benchmarks including ADE20K (Zhou et al., 2016) and COCO (Lin et al., 2014), whilst making it more robust towards challenging images from other domains (the three columns on the left are from PASCAL (Everingham et al., 2015); the three on the right are synthesized by the text-to-image generation model Kandinsky 2 (Forever, 2023)). We propose SegGen, a highly-effective training data generation method for image segmentation, which pushes the performance limits of state-of-the-art segmentation models to a significant extent. On the highly competitive ADE20K and COCO benchmarks, our data generation method markedly improves the performance of state-of-the-art segmentation models in semantic segmentation, panoptic segmentation, and instance segmentation. Notably, in terms of the ADE20K mIoU, Mask2Former R50 is largely boosted from 47.2 to 49.9 (+2.7); Mask2Former Swin-L is also significantly increased from 56.1 to 57.4 (+1.3). These promising results strongly suggest the effectiveness of our SegGen even when abundant human-annotated training data is utilized. Moreover, training with our synthetic data makes the segmentation models more robust towards unseen domains. Image segmentation explores the identification of objects in visual inputs at the pixel level. Based on the different emphases on category and instance membership information, researchers have divided image segmentation into several tasks (Long et al., 2015; Chen et al., 2015; Kirillov et al., 2019; Qi et al., 2022). For example, semantic segmentation studies pixel-level understanding of object categories, instance segmentation focuses on instance grouping of pixels, while panoptic segmentation considers both. Figure 2: Illustration of the workflow of our proposed SegGen.
arXiv.org Artificial Intelligence
Nov-6-2023
- Country:
- North America > United States (0.14)
- Genre:
- Research Report (0.82)
- Industry:
- Leisure & Entertainment (0.67)
- Transportation (0.46)
- Technology: