One Ref: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling

Neural Information Processing Systems 

In this paper, we propose OneRef, a minimalist referring framework built on the modality-shared one-tower transformer that unifies the visual and linguistic feature spaces.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found