AITopics | Su, Zhizhong

Collaborating Authors

Su, Zhizhong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GeoFlow-SLAM: A Robust Tightly-Coupled RGBD-Inertial Fusion SLAM for Dynamic Legged Robotics

Xiao, Tingyang, Zhou, Xiaolin, Liu, Liu, Sui, Wei, Feng, Wei, Qiu, Jiaxiong, Wang, Xinjie, Su, Zhizhong

arXiv.org Artificial IntelligenceMar-18-2025

This paper presents GeoFlow-SLAM, a robust and effective Tightly-Coupled RGBD-inertial SLAM for legged robots operating in highly dynamic environments.By integrating geometric consistency, legged odometry constraints, and dual-stream optical flow (GeoFlow), our method addresses three critical challenges:feature matching and pose initialization failures during fast locomotion and visual feature scarcity in texture-less scenes.Specifically, in rapid motion scenarios, feature matching is notably enhanced by leveraging dual-stream optical flow, which combines prior map points and poses. Additionally, we propose a robust pose initialization method for fast locomotion and IMU error in legged robots, integrating IMU/Legged odometry, inter-frame Perspective-n-Point (PnP), and Generalized Iterative Closest Point (GICP). Furthermore, a novel optimization framework that tightly couples depth-to-map and GICP geometric constraints is first introduced to improve the robustness and accuracy in long-duration, visually texture-less environments. The proposed algorithms achieve state-of-the-art (SOTA) on collected legged robots and open-source datasets. To further promote research and development, the open-source datasets and code will be made publicly available at https://github.com/NSN-Hello/GeoFlow-SLAM

constraint, feature point, information, (17 more...)

arXiv.org Artificial Intelligence

2503.14247

Country: Asia > China (0.14)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.91)

Add feedback

Gaussian Object Carver: Object-Compositional Gaussian Splatting with surfaces completion

Liu, Liu, Wang, Xinjie, Qiu, Jiaxiong, Lin, Tianwei, Zhou, Xiaolin, Su, Zhizhong

arXiv.org Artificial IntelligenceDec-2-2024

3D scene reconstruction is a foundational problem in computer vision. Despite recent advancements in Neural Implicit Representations (NIR), existing methods often lack editability and compositional flexibility, limiting their use in scenarios requiring high interactivity and object-level manipulation. In this paper, we introduce the Gaussian Object Carver (GOC), a novel, efficient, and scalable framework for object-compositional 3D scene reconstruction. GOC leverages 3D Gaussian Splatting (GS), enriched with monocular geometry priors and multi-view geometry regularization, to achieve high-quality and flexible reconstruction. Furthermore, we propose a zero-shot Object Surface Completion (OSC) model, which uses 3D priors from 3d object data to reconstruct unobserved surfaces, ensuring object completeness even in occluded areas. Experimental results demonstrate that GOC improves reconstruction efficiency and geometric fidelity. It holds promise for advancing the practical application of digital twins in embodied AI, AR/VR, and interactive simulation environments.

artificial intelligence, machine learning, reconstruction, (15 more...)

arXiv.org Artificial Intelligence

2412.02075

Country:

North America > United States > Texas (0.14)
Asia > China (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

BIP3D: Bridging 2D Images and 3D Perception for Embodied Intelligence

Lin, Xuewu, Lin, Tianwei, Huang, Lichao, Xie, Hongyu, Su, Zhizhong

arXiv.org Artificial IntelligenceNov-27-2024

In embodied intelligence systems, a key component is 3D perception algorithm, which enables agents to understand their surrounding environments. Previous algorithms primarily rely on point cloud, which, despite offering precise geometric information, still constrain perception performance due to inherent sparsity, noise, and data scarcity. In this work, we introduce a novel image-centric 3D perception model, BIP3D, which leverages expressive image features with explicit 3D position encoding to overcome the limitations of point-centric methods. Specifically, we leverage pre-trained 2D vision foundation models to enhance semantic understanding, and introduce a spatial enhancer module to improve spatial understanding. Together, these modules enable BIP3D to achieve multi-view, multi-modal feature fusion and end-to-end 3D perception. In our experiments, BIP3D outperforms current state-of-the-art results on the EmbodiedScan benchmark, achieving improvements of 5.69% in the 3D detection task and 15.25% in the 3D visual grounding task.

detection, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2411.14869

Country: Asia > China (0.14)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Sparse4D v3: Advancing End-to-End 3D Detection and Tracking

Lin, Xuewu, Pei, Zixiang, Lin, Tianwei, Huang, Lichao, Su, Zhizhong

arXiv.org Artificial IntelligenceNov-20-2023

In autonomous driving perception systems, 3D detection and tracking are the two fundamental tasks. This paper delves deeper into this field, building upon the Sparse4D framework. We introduce two auxiliary training tasks (Temporal Instance Denoising and Quality Estimation) and propose decoupled attention to make structural improvements, leading to significant enhancements in detection performance. Additionally, we extend the detector into a tracker using a straightforward approach that assigns instance ID during inference, further highlighting the advantages of query-based algorithms. Extensive experiments conducted on the nuScenes benchmark validate the effectiveness of the proposed improvements. With ResNet50 as the backbone, we witnessed enhancements of 3.0%, 2.2%, and 7.6% in mAP, NDS, and AMOTA, achieving 46.9%, 56.1%, and 49.0%, respectively. Our best model achieved 71.9% NDS and 67.7% AMOTA on the nuScenes test set.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2311.11722

Country: Asia > China (0.14)

Genre: Research Report (1.00)

Industry:

Information Technology (0.34)
Transportation > Ground > Road (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Vision (0.92)
Information Technology > Artificial Intelligence > Robots (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback