SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion Ming Dai 1, Lingfeng Y ang

Feb-18-2026, 09:25:51 GMT–Neural Information Processing Systems

Visual grounding is a common vision task that involves grounding descriptive sentences to the corresponding regions of an image. Most existing methods use independent image-text encoding and apply complex hand-crafted modules or encoder-decoder architectures for modal interaction and query reasoning.

computer vision, large language model, machine learning, (17 more...)

Neural Information Processing Systems

Feb-18-2026, 09:25:51 GMT

Conferences PDF

Add feedback

Country:
- Asia > China
  - Heilongjiang Province > Daqing (0.04)
  - Jiangsu Province > Nanjing (0.04)

Genre:
- Research Report > Experimental Study (1.00)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Natural Language > Large Language Model (0.93)
  - Machine Learning > Neural Networks (0.68)

Duplicate Docs Excel Report

Title
SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion Ming Dai 1, Lingfeng Y ang

Similar Docs Excel Report more

Title	Similarity	Source
None found