CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation

Neural Information Processing Systems 

To demonstrate the effectiveness of our approach, we change the image backbone of CoupAlign to different networks, like Resnet101 [3] and Darknet53 [9], and evaluate it on the RefCOCO validation set. In Tab. 1, we compare our results with the methods using Resnet101 as the image backbone. In Tab. 2, we compare the methods using Darknet53. The results show that CoupAlign still suppresses previous methods when using the same image backbone, which indicates that our CoupAlign is compatible with popular backbones. In our experiment, we use four WPA modules, two of which are in the early encoding stage and the other two are in the late encoding stage.