Goto

Collaborating Authors

 Asia


SimVG: A Simple Framework for Visual Grounding with Decoupled Multi-modal Fusion Ming Dai 1, Lingfeng Y ang

Neural Information Processing Systems

Visual grounding is a common vision task that involves grounding descriptive sentences to the corresponding regions of an image. Most existing methods use independent image-text encoding and apply complex hand-crafted modules or encoder-decoder architectures for modal interaction and query reasoning.








Robust Contrastive Multi-view Clustering against Dual Noisy Correspondence

Neural Information Processing Systems

Recently, contrastive multi-view clustering (MvC) has emerged as a promising avenue for analyzing data from heterogeneous sources, typically leveraging the off-the-shelf instances as positives and randomly sampled ones as negatives. In practice, however, this paradigm would unavoidably suffer from the Dual Noisy Correspondence (DNC) problem, where noise compromises the constructions of both positive and negative pairs.