SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion Ming Dai 1, Lingfeng Y ang

Open in new window