AITopics | Sheng, Yu

Collaborating Authors

Sheng, Yu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MSGField: A Unified Scene Representation Integrating Motion, Semantics, and Geometry for Robotic Manipulation

Sheng, Yu, Lin, Runfeng, Wang, Lidian, Qiu, Quecheng, Zhang, YanYong, Zhang, Yu, Hua, Bei, Ji, Jianmin

arXiv.org Artificial IntelligenceOct-21-2024

Combining accurate geometry with rich semantics has been proven to be highly effective for language-guided robotic manipulation. Existing methods for dynamic scenes either fail to update in real-time or rely on additional depth sensors for simple scene editing, limiting their applicability in real-world. In this paper, we introduce MSGField, a representation that uses a collection of 2D Gaussians for high-quality reconstruction, further enhanced with attributes to encode semantic and motion information. Specially, we represent the motion field compactly by decomposing each primitive's motion into a combination of a limited set of motion bases. Leveraging the differentiable real-time rendering of Gaussian splatting, we can quickly optimize object motion, even for complex non-rigid motions, with image supervision from only two camera views. Additionally, we designed a pipeline that utilizes object priors to efficiently obtain well-defined semantics. In our challenging dataset, which includes flexible and extremely small objects, our method achieve a success rate of 79.2% in static and 63.3% in dynamic environments for language-guided manipulation. For specified object grasping, we achieve a success rate of 90%, on par with point cloud-based methods. Code and dataset will be released at:https://shengyu724.github.io/MSGField.github.io.

artificial intelligence, manipulation, msgfield, (12 more...)

arXiv.org Artificial Intelligence

2410.1573

Country: Asia > China > Anhui Province (0.14)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes

Sheng, Yu, Zhang, Lu, Li, Xingchen, Duan, Yifan, Zhang, Yanyong, Zhang, Yu, Ji, Jianmin

arXiv.org Artificial IntelligenceApr-7-2024

Prior point cloud provides 3D environmental context, which enhances the capabilities of monocular camera in downstream vision tasks, such as 3D object detection, via data fusion. However, the absence of accurate and automated registration methods for estimating camera extrinsic parameters in roadside scene point clouds notably constrains the potential applications of roadside cameras. This paper proposes a novel approach for the automatic registration between prior point clouds and images from roadside scenes. The main idea involves rendering photorealistic grayscale views taken at specific perspectives from the prior point cloud with the help of their features like RGB or intensity values. These generated views can reduce the modality differences between images and prior point clouds, thereby improve the robustness and accuracy of the registration results. Particularly, we specify an efficient algorithm, named neighbor rendering, for the rendering process. Then we introduce a method for automatically estimating the initial guess using only rough guesses of camera's position. At last, we propose a procedure for iteratively refining the extrinsic parameters by minimizing the reprojection error for line features extracted from both generated and camera images using Segment Anything Model (SAM). We assess our method using a self-collected dataset, comprising eight cameras strategically positioned throughout the university campus. Experiments demonstrate our method's capability to automatically align prior point cloud with roadside camera image, achieving a rotation accuracy of 0.202 degrees and a translation precision of 0.079m. Furthermore, we validate our approach's effectiveness in visual applications by substantially improving monocular 3D object detection performance.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.05164

Country: Asia > China > Anhui Province (0.14)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Information Fusion (0.66)
(2 more...)

Add feedback

MM-Gaussian: 3D Gaussian-based Multi-modal Fusion for Localization and Reconstruction in Unbounded Scenes

Wu, Chenyang, Duan, Yifan, Zhang, Xinran, Sheng, Yu, Ji, Jianmin, Zhang, Yanyong

arXiv.org Artificial IntelligenceApr-5-2024

Localization and mapping are critical tasks for various applications such as autonomous vehicles and robotics. The challenges posed by outdoor environments present particular complexities due to their unbounded characteristics. In this work, we present MM-Gaussian, a LiDAR-camera multi-modal fusion system for localization and mapping in unbounded scenes. Our approach is inspired by the recently developed 3D Gaussians, which demonstrate remarkable capabilities in achieving high rendering quality and fast rendering speed. Specifically, our system fully utilizes the geometric structure information provided by solid-state LiDAR to address the problem of inaccurate depth encountered when relying solely on visual solutions in unbounded, outdoor scenarios. Additionally, we utilize 3D Gaussian point clouds, with the assistance of pixel-level gradient descent, to fully exploit the color information in photos, thereby achieving realistic rendering effects. To further bolster the robustness of our system, we designed a relocalization module, which assists in returning to the correct trajectory in the event of a localization failure. Experiments conducted in multiple scenarios demonstrate the effectiveness of our method.

artificial intelligence, gaussian, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2404.04026

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)

Add feedback

EdgeCalib: Multi-Frame Weighted Edge Features for Automatic Targetless LiDAR-Camera Calibration

Li, Xingchen, Duan, Yifan, Wang, Beibei, Ren, Haojie, You, Guoliang, Sheng, Yu, Ji, Jianmin, Zhang, Yanyong

arXiv.org Artificial IntelligenceOct-25-2023

In multimodal perception systems, achieving precise extrinsic calibration between LiDAR and camera is of critical importance. Previous calibration methods often required specific targets or manual adjustments, making them both labor-intensive and costly. Online calibration methods based on features have been proposed, but these methods encounter challenges such as imprecise feature extraction, unreliable cross-modality associations, and high scene-specific requirements. To address this, we introduce an edge-based approach for automatic online calibration of LiDAR and cameras in real-world scenarios. The edge features, which are prevalent in various environments, are aligned in both images and point clouds to determine the extrinsic parameters. Specifically, stable and robust image edge features are extracted using a SAM-based method and the edge features extracted from the point cloud are weighted through a multi-frame weighting strategy for feature filtering. Finally, accurate extrinsic parameters are optimized based on edge correspondence constraints. We conducted evaluations on both the KITTI dataset and our dataset. The results show a state-of-the-art rotation accuracy of 0.086{\deg} and a translation accuracy of 0.977 cm, outperforming existing edge-based calibration methods in both precision and robustness.

artificial intelligence, automatic targetless lidar-camera calibration, multi-frame weighted edge feature, (1 more...)

arXiv.org Artificial Intelligence

2310.16629

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Vision (1.00)

Add feedback

Judicial Intelligent Assistant System: Extracting Events from Divorce Cases to Detect Disputes for the Judge

Zhang, Yuan, Li, Chuanyi, Sheng, Yu, Ge, Jidong, Luo, Bin

arXiv.org Artificial IntelligenceMar-23-2023

In formal procedure of civil cases, the textual materials provided by different parties describe the development process of the cases. It is a difficult but necessary task to extract the key information for the cases from these textual materials and to clarify the dispute focus of related parties. Currently, officers read the materials manually and use methods, such as keyword searching and regular matching, to get the target information. These approaches are time-consuming and heavily depending on prior knowledge and carefulness of the officers. To assist the officers to enhance working efficiency and accuracy, we propose an approach to detect disputes from divorce cases based on a two-round-labeling event extracting technique in this paper. We implement the Judicial Intelligent Assistant (JIA) system according to the proposed approach to 1) automatically extract focus events from divorce case materials, 2) align events by identifying co-reference among them, and 3) detect conflicts among events brought by the plaintiff and the defendant. With the JIA system, it is convenient for judges to determine the disputed issues. Experimental results demonstrate that the proposed approach and system can obtain the focus of cases and detect conflicts more effectively and efficiently comparing with existing method.

artificial intelligence, machine learning, trigger chunk, (17 more...)

arXiv.org Artificial Intelligence

2303.16751

Country: Asia > China (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Law > Litigation (1.00)
Health & Medicine > Therapeutic Area > Psychiatry/Psychology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback