AITopics | Sui, Wei

Collaborating Authors

Sui, Wei

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GeoFlow-SLAM: A Robust Tightly-Coupled RGBD-Inertial Fusion SLAM for Dynamic Legged Robotics

Xiao, Tingyang, Zhou, Xiaolin, Liu, Liu, Sui, Wei, Feng, Wei, Qiu, Jiaxiong, Wang, Xinjie, Su, Zhizhong

arXiv.org Artificial IntelligenceMar-18-2025

This paper presents GeoFlow-SLAM, a robust and effective Tightly-Coupled RGBD-inertial SLAM for legged robots operating in highly dynamic environments.By integrating geometric consistency, legged odometry constraints, and dual-stream optical flow (GeoFlow), our method addresses three critical challenges:feature matching and pose initialization failures during fast locomotion and visual feature scarcity in texture-less scenes.Specifically, in rapid motion scenarios, feature matching is notably enhanced by leveraging dual-stream optical flow, which combines prior map points and poses. Additionally, we propose a robust pose initialization method for fast locomotion and IMU error in legged robots, integrating IMU/Legged odometry, inter-frame Perspective-n-Point (PnP), and Generalized Iterative Closest Point (GICP). Furthermore, a novel optimization framework that tightly couples depth-to-map and GICP geometric constraints is first introduced to improve the robustness and accuracy in long-duration, visually texture-less environments. The proposed algorithms achieve state-of-the-art (SOTA) on collected legged robots and open-source datasets. To further promote research and development, the open-source datasets and code will be made publicly available at https://github.com/NSN-Hello/GeoFlow-SLAM

constraint, feature point, information, (17 more...)

arXiv.org Artificial Intelligence

2503.14247

Country: Asia > China (0.14)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.91)

Add feedback

A Light-Weight Framework for Open-Set Object Detection with Decoupled Feature Alignment in Joint Space

He, Yonghao, Su, Hu, Yu, Haiyong, Yang, Cong, Sui, Wei, Wang, Cong, Liu, Song

arXiv.org Artificial IntelligenceDec-25-2024

Open-set object detection (OSOD) is highly desirable for robotic manipulation in unstructured environments. However, existing OSOD methods often fail to meet the requirements of robotic applications due to their high computational burden and complex deployment. To address this issue, this paper proposes a light-weight framework called Decoupled OSOD (DOSOD), which is a practical and highly efficient solution to support real-time OSOD tasks in robotic systems. Specifically, DOSOD builds upon the YOLO-World pipeline by integrating a vision-language model (VLM) with a detector. A Multilayer Perceptron (MLP) adaptor is developed to transform text embeddings extracted by the VLM into a joint space, within which the detector learns the region representations of class-agnostic proposals. Cross-modality features are directly aligned in the joint space, avoiding the complex feature interactions and thereby improving computational efficiency. DOSOD operates like a traditional closed-set detector during the testing phase, effectively bridging the gap between closed-set and open-set detection. Compared to the baseline YOLO-World, the proposed DOSOD significantly enhances real-time performance while maintaining comparable accuracy. The slight DOSOD-S model achieves a Fixed AP of $26.7\%$, compared to $26.2\%$ for YOLO-World-v1-S and $22.7\%$ for YOLO-World-v2-S, using similar backbones on the LVIS minival dataset. Meanwhile, the FPS of DOSOD-S is $57.1\%$ higher than YOLO-World-v1-S and $29.6\%$ higher than YOLO-World-v2-S. Meanwhile, we demonstrate that the DOSOD model facilitates the deployment of edge devices. The codes and models are publicly available at https://github.com/D-Robotics-AI-Lab/DOSOD.

artificial intelligence, detection, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.1468

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)

Add feedback

Gyroscope-Assisted Motion Deblurring Network

Luan, Simin, Yang, Cong, Boukhers, Zeyd, Qin, Xue, Cheng, Dongfeng, Sui, Wei, Li, Zhijun

arXiv.org Artificial IntelligenceFeb-9-2024

Image research has shown substantial attention in deblurring networks in recent years. Yet, their practical usage in real-world deblurring, especially motion blur, remains limited due to the lack of pixel-aligned training triplets (background, blurred image, and blur heat map) and restricted information inherent in blurred images. This paper presents a simple yet efficient framework to synthetic and restore motion blur images using Inertial Measurement Unit (IMU) data. Notably, the framework includes a strategy for training triplet generation, and a Gyroscope-Aided Motion Deblurring (GAMD) network for blurred image restoration. The rationale is that through harnessing IMU data, we can determine the transformation of the camera pose during the image exposure phase, facilitating the deduction of the motion trajectory (aka. blur trajectory) for each point inside the three-dimensional space. Thus, the synthetic triplets using our strategy are inherently close to natural motion blur, strictly pixel-aligned, and mass-producible. Through comprehensive experiments, we demonstrate the advantages of the proposed framework: only two-pixel errors between our synthetic and real-world blur trajectories, a marked improvement (around 33.17%) of the state-of-the-art deblurring method MIMO on Peak Signal-to-Noise Ratio (PSNR).

artificial intelligence, machine learning, trajectory, (16 more...)

arXiv.org Artificial Intelligence

2402.06854

Genre: Research Report (0.64)

Industry: Media > Photography (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)
Information Technology > Sensing and Signal Processing > Image Processing (0.88)

Add feedback

Towards Accurate Ground Plane Normal Estimation from Ego-Motion

Zhang, Jiaxin, Sui, Wei, Zhang, Qian, Chen, Tao, Yang, Cong

arXiv.org Artificial IntelligenceDec-8-2022

In this paper, we introduce a novel approach for ground plane normal estimation of wheeled vehicles. In practice, the ground plane is dynamically changed due to braking and unstable road surface. As a result, the vehicle pose, especially the pitch angle, is oscillating from subtle to obvious. Thus, estimating ground plane normal is meaningful since it can be encoded to improve the robustness of various autonomous driving tasks (e.g., 3D object detection, road surface reconstruction, and trajectory planning). Our proposed method only uses odometry as input and estimates accurate ground plane normal vectors in real time. Particularly, it fully utilizes the underlying connection between the ego pose odometry (ego-motion) and its nearby ground plane. Built on that, an Invariant Extended Kalman Filter (IEKF) is designed to estimate the normal vector in the sensor's coordinate. Thus, our proposed method is simple yet efficient and supports both camera- and inertial-based odometry algorithms. Its usability and the marked improvement of robustness are validated through multiple experiments on public datasets. For instance, we achieve state-of-the-art accuracy on KITTI dataset with the estimated vector error of 0.39{\deg}. Our code is available at github.com/manymuch/ground_normal_filter.

artificial intelligence, ground plane, machine learning, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.3390/s22239375

2212.04224

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Alaska (0.28)

Genre: Research Report > Promising Solution (0.34)

Industry:

Information Technology (0.68)
Transportation > Ground > Road (0.36)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.50)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Monocular Road Planar Parallax Estimation

Yuan, Haobo, Chen, Teng, Sui, Wei, Xie, Jiafeng, Zhang, Lefei, Li, Yuan, Zhang, Qian

arXiv.org Artificial IntelligenceNov-22-2021

Estimating the 3D structure of the drivable surface and surrounding environment is a crucial task for assisted and autonomous driving. It is commonly solved either by using expensive 3D sensors such as LiDAR or directly predicting the depth of points via deep learning. Instead of following existing methodologies, we propose Road Planar Parallax Attention Network (RPANet), a new deep neural network for 3D sensing from monocular image sequences based on planar parallax, which takes full advantage of the commonly seen road plane geometry in driving scenes. RPANet takes a pair of images aligned by the homography of the road plane as input and outputs a $\gamma$ map for 3D reconstruction. Beyond estimating the depth or height, the $\gamma$ map has a potential to construct a two-dimensional transformation between two consecutive frames while can be easily derived to depth or height. By warping the consecutive frames using the road plane as a reference, the 3D structure can be estimated from the planar parallax and the residual image displacements. Furthermore, to make the network better perceive the displacements caused by planar parallax, we introduce a novel cross-attention module. We sample data from the Waymo Open Dataset and construct data related to planar parallax. Comprehensive experiments are conducted on the sampled dataset to demonstrate the 3D reconstruction accuracy of our approach in challenging scenarios.

artificial intelligence, machine learning, proceedings, (14 more...)

arXiv.org Artificial Intelligence

2111.11089

Country:

Asia (0.46)
North America > United States > Michigan (0.14)

Genre: Research Report (0.64)

Industry: Transportation > Ground > Road (0.48)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback