Goto

Collaborating Authors

 spatial information







RG-SAN: Rule-Guided Spatial Awareness Network for End-to-End 3D Referring Expression Segmentation

Neural Information Processing Systems

However, traditional approaches frequently encounter issues like over-segmentation or mis-segmentation, due to insufficient emphasis on spatial information of instances. In this paper, we introduce a Rule-Guided Spatial Awareness Network (RG-SAN) by utilizing solely the spatial information of the target instance for supervision. This approach enables the network to accurately depict the spatial relationships among all entities described in the text, thus enhancing the reasoning capabilities. The RG-SAN consists of the Text-driven Localization Module (TLM) and the Rule-guided Weak Supervision (RWS) strategy. The TLM initially locates all mentioned instances and iteratively refines their positional information. The RWS strategy, acknowledging that only target objects have supervised positional information, employs dependency tree rules to precisely guide the core instance's positioning. Extensive testing on the ScanRefer benchmark has shown that RG-SAN not only establishes new performance benchmarks, with an mIoU increase of 5.1 points, but also exhibits significant improvements in robustness when processing descriptions with spatial ambiguity. All codes are available at https://github.com/sosppxo/RG-SAN.



RG-SAN: Rule-GuidedSpatialAwarenessNetworkfor End-to-End3DReferringExpressionSegmentation

Neural Information Processing Systems

TGNN[24]introduce3D-RESby extending the bounding box annotations of ScanRefer [5] to masks by incorporating the instance masks from ScanNet and proposed a two-stage pipeline. Further, 3D-STMN [65] proposed an end-to-end method that matches the text and superpoints to get the 3D segmentation of the target object directly.



Learning Topology-Agnostic EEG Representations with Geometry-Aware Modeling

Neural Information Processing Systems

Large-scale pre-training has shown great potential to enhance models on downstream tasks in vision and language. Developing similar techniques for scalp electroencephalogram (EEG) is suitable since unlabelled data is plentiful.