AITopics | Wu, Yanmin

Collaborating Authors

Wu, Yanmin

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

OpenGaussian: Towards Point-Level 3D Gaussian-based Open Vocabulary Understanding

Wu, Yanmin, Meng, Jiarui, Li, Haijie, Wu, Chenming, Shi, Yahao, Cheng, Xinhua, Zhao, Chen, Feng, Haocheng, Ding, Errui, Wang, Jingdong, Zhang, Jian

arXiv.org Artificial IntelligenceJun-4-2024

This paper introduces OpenGaussian, a method based on 3D Gaussian Splatting (3DGS) capable of 3D point-level open vocabulary understanding. Our primary motivation stems from observing that existing 3DGS-based open vocabulary methods mainly focus on 2D pixel-level parsing. These methods struggle with 3D point-level tasks due to weak feature expressiveness and inaccurate 2D-3D feature associations. To ensure robust feature presentation and 3D point-level understanding, we first employ SAM masks without cross-frame associations to train instance features with 3D consistency. These features exhibit both intra-object consistency and inter-object distinction. Then, we propose a two-stage codebook to discretize these features from coarse to fine levels. At the coarse level, we consider the positional information of 3D points to achieve location-based clustering, which is then refined at the fine level. Finally, we introduce an instance-level 3D-2D feature association method that links 3D points to 2D masks, which are further associated with 2D CLIP features. Extensive experiments, including open vocabulary-based 3D object selection, 3D point cloud understanding, click-based 3D object selection, and ablation studies, demonstrate the effectiveness of our proposed method. Project page: https://3d-aigc.github.io/OpenGaussian

arxiv preprint arxiv, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2406.02058

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

PAS-SLAM: A Visual SLAM System for Planar Ambiguous Scenes

Hu, Xinggang, Wu, Yanmin, Zhao, Mingyuan, Yang, Linghao, Zhang, Xiangkui, Ji, Xiangyang

arXiv.org Artificial IntelligenceFeb-8-2024

Visual SLAM (Simultaneous Localization and Mapping) based on planar features has found widespread applications in fields such as environmental structure perception and augmented reality. However, current research faces challenges in accurately localizing and mapping in planar ambiguous scenes, primarily due to the poor accuracy of the employed planar features and data association methods. In this paper, we propose a visual SLAM system based on planar features designed for planar ambiguous scenes, encompassing planar processing, data association, and multi-constraint factor graph optimization. We introduce a planar processing strategy that integrates semantic information with planar features, extracting the edges and vertices of planes to be utilized in tasks such as plane selection, data association, and pose optimization. Next, we present an integrated data association strategy that combines plane parameters, semantic information, projection IoU (Intersection over Union), and non-parametric tests, achieving accurate and robust plane data association in planar ambiguous scenes. Finally, we design a set of multi-constraint factor graphs for camera pose optimization. Qualitative and quantitative experiments conducted on publicly available datasets demonstrate that our proposed system competes effectively in both accuracy and robustness in terms of map construction and camera localization compared to state-of-the-art methods.

artificial intelligence, information fusion, plane, (17 more...)

arXiv.org Artificial Intelligence

2402.06131

Country: Asia > China > Liaoning Province (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.49)
Information Technology > Human Computer Interaction > Interfaces > Virtual Reality (0.35)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.34)

Add feedback

Language-Assisted 3D Scene Understanding

Wu, Yanmin, Gao, Qiankun, Zhang, Renrui, Zhang, Jian

arXiv.org Artificial IntelligenceDec-31-2023

The scale and quality of point cloud datasets constrain the advancement of point cloud learning. Recently, with the development of multi-modal learning, the incorporation of domain-agnostic prior knowledge from other modalities, such as images and text, to assist in point cloud feature learning has been considered a promising avenue. Existing methods have demonstrated the effectiveness of multi-modal contrastive training and feature distillation on point clouds. However, challenges remain, including the requirement for paired triplet data, redundancy and ambiguity in supervised features, and the disruption of the original priors. In this paper, we propose a language-assisted approach to point cloud feature learning (LAST-PCL), enriching semantic concepts through LLMs-based text enrichment. We achieve de-redundancy and feature dimensionality reduction without compromising textual priors by statistical-based and training-free significant feature selection. Furthermore, we also delve into an in-depth analysis of the impact of text contrastive training on the point cloud. Extensive experiments validate that the proposed method learns semantically meaningful point cloud features and achieves state-of-the-art or comparable performance in 3D semantic segmentation, 3D object detection, and 3D scene classification tasks.

category, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2312.11451

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling

Yang, Linghao, Wu, Yanmin, Deng, Yu, Tian, Rui, Hu, Xinggang, Ma, Tiefeng

arXiv.org Artificial IntelligenceOct-2-2023

Tracking and modeling unknown rigid objects in the environment play a crucial role in autonomous unmanned systems and virtual-real interactive applications. However, many existing Simultaneous Localization, Mapping and Moving Object Tracking (SLAMMOT) methods focus solely on estimating specific object poses and lack estimation of object scales and are unable to effectively track unknown objects. In this paper, we propose a novel SLAM backend that unifies ego-motion tracking, rigid object motion tracking, and modeling within a joint optimization framework. In the perception part, we designed a pixel-level asynchronous object tracker (AOT) based on the Segment Anything Model (SAM) and DeAOT, enabling the tracker to effectively track target unknown objects guided by various predefined tasks and prompts. In the modeling part, we present a novel object-centric quadric parameterization to unify both static and dynamic object initialization and optimization. Subsequently, in the part of object state estimation, we propose a tightly coupled optimization model for object pose and scale estimation, incorporating hybrids constraints into a novel dual sliding window optimization framework for joint estimation. To our knowledge, we are the first to tightly couple object pose tracking with light-weight modeling of dynamic and static objects using quadric. We conduct qualitative and quantitative experiments on simulation datasets and real-world datasets, demonstrating the state-of-the-art robustness and accuracy in motion estimation and modeling. Our system showcases the potential application of object perception in complex dynamic scenes.

artificial intelligence, image understanding, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2309.17036

Country: Asia > China (0.68)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision > Image Understanding (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.74)

Add feedback

An Object SLAM Framework for Association, Mapping, and High-Level Tasks

Wu, Yanmin, Zhang, Yunzhou, Zhu, Delong, Deng, Zhiqiang, Sun, Wenkai, Chen, Xin, Zhang, Jian

arXiv.org Artificial IntelligenceMay-12-2023

Object SLAM is considered increasingly significant for robot high-level perception and decision-making. Existing studies fall short in terms of data association, object representation, and semantic mapping and frequently rely on additional assumptions, limiting their performance. In this paper, we present a comprehensive object SLAM framework that focuses on object-based perception and object-oriented robot tasks. First, we propose an ensemble data association approach for associating objects in complicated conditions by incorporating parametric and nonparametric statistic testing. In addition, we suggest an outlier-robust centroid and scale estimation algorithm for modeling objects based on the iForest and line alignment. Then a lightweight and object-oriented map is represented by estimated general object models. Taking into consideration the semantic invariance of objects, we convert the object map to a topological map to provide semantic descriptors to enable multi-map matching. Finally, we suggest an object-driven active exploration strategy to achieve autonomous mapping in the grasping scenario. A range of public datasets and real-world results in mapping, augmented reality, scene matching, relocalization, and robotic manipulation have been used to evaluate the proposed object SLAM framework for its efficient performance.

artificial intelligence, high-level task, object slam framework, (2 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/TRO.2023.3273180

2305.07299

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.73)

Add feedback

EDA: Explicit Text-Decoupling and Dense Alignment for 3D Visual Grounding

Wu, Yanmin, Cheng, Xinhua, Zhang, Renrui, Cheng, Zesen, Zhang, Jian

arXiv.org Artificial IntelligenceApr-24-2023

3D visual grounding aims to find the object within point clouds mentioned by free-form natural language descriptions with rich semantic cues. However, existing methods either extract the sentence-level features coupling all words or focus more on object names, which would lose the word-level information or neglect other attributes. To alleviate these issues, we present EDA that Explicitly Decouples the textual attributes in a sentence and conducts Dense Alignment between such fine-grained language and point cloud objects. Specifically, we first propose a text decoupling module to produce textual features for every semantic component. Then, we design two losses to supervise the dense matching between two modalities: position alignment loss and semantic alignment loss. On top of that, we further introduce a new visual grounding task, locating objects without object names, which can thoroughly evaluate the model's dense alignment capacity. Through experiments, we achieve state-of-the-art performance on two widely-adopted 3D visual grounding datasets, ScanRefer and SR3D/NR3D, and obtain absolute leadership on our newly-proposed task. The source code is available at https://github.com/yanmin-wu/EDA.

artificial intelligence, explicit text-decoupling and dense alignment, natural language, (2 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/CVPR52729.2023.01843

2209.14941

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.87)

Add feedback