SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion Ming Dai 1, Lingfeng Y ang
–Neural Information Processing Systems
Visual grounding is a common vision task that involves grounding descriptive sentences to the corresponding regions of an image. Most existing methods use independent image-text encoding and apply complex hand-crafted modules or encoder-decoder architectures for modal interaction and query reasoning.
Neural Information Processing Systems
Feb-18-2026, 09:25:51 GMT
- Country:
- Asia > China
- Heilongjiang Province > Daqing (0.04)
- Jiangsu Province > Nanjing (0.04)
- Asia > China
- Genre:
- Research Report > Experimental Study (1.00)
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning > Neural Networks (0.68)
- Natural Language > Large Language Model (0.93)
- Vision (1.00)
- Information Technology > Artificial Intelligence