Referencing Where to Focus: Improving Visual Grounding with Referential Query Y abing Wang
–Neural Information Processing Systems
Furthermore, they only use the deepest image feature during the query learning process, overlooking the importance of features from other levels. To address these issues, we propose a novel approach, called RefFormer. It consists of the query adaption module that can be seamlessly integrated into CLIP and generate the referential query to provide the prior context for decoder, along with a task-specific decoder.
Neural Information Processing Systems
Nov-18-2025, 06:26:55 GMT
- Country:
- Asia > China
- Guangdong Province > Shenzhen (0.04)
- Heilongjiang Province
- Shaanxi Province > Xi'an (0.04)
- Europe > Netherlands
- North Holland > Amsterdam (0.04)
- Asia > China
- Genre:
- Research Report
- Experimental Study (1.00)
- New Finding (0.68)
- Research Report
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning
- Neural Networks (0.93)
- Statistical Learning (0.67)
- Natural Language (1.00)
- Vision (1.00)
- Machine Learning
- Information Technology > Artificial Intelligence