Supplementary Material: Segment Anything in High Quality

Neural Information Processing Systems 

In this supplementary material, Section 1 first presents the additional experimental analysis of our HQ-SAM, including more zero-shot transfer comparisons to SAM on both image and video benchmarks. SAM vs. HQ-SAM on V arious Backbones In Table 1, we provide a comprehensive comparison Table 2: Results on Y ouTubeVIS 2019 validation set and HQ-YTVIS test set using ViT -L based SAM. In Table 2, HQ-SAM achieves consistent gains of 1.4 points in Tube Mask AP, Robustness to Input Box Prompts In Table 4, we compare HQ-SAM to SAM by adding various scales of noises to the input ground truth box prompts. "center" point of Ground Truth (GT) masks, which is at a maximal value location in a mask's interior Results not obtained in a zero-shot manner (i.e. the training HQ-SAM improves over SAM, but still cannot achieve fully correct mask prediction. HQ-SAM produces significantly more accurate boundaries.