AITopics | Yan, Dongyu

Plotting

Yan, Dongyu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs

Xu, Xinli, Ge, Wenhang, Qiu, Dicong, Chen, ZhiFei, Yan, Dongyu, Liu, Zhuoyun, Zhao, Haoyu, Zhao, Hanfeng, Zhang, Shunsi, Liang, Junwei, Chen, Ying-Cong

arXiv.org Artificial IntelligenceDec-15-2024

Estimating physical properties for visual data is a crucial task in computer vision, graphics, and robotics, underpinning applications such as augmented reality, physical simulation, and robotic grasping. However, this area remains under-explored due to the inherent ambiguities in physical property estimation. To address these challenges, we introduce GaussianProperty, a training-free framework that assigns physical properties of materials to 3D Gaussians. Specifically, we integrate the segmentation capability of SAM with the recognition capability of GPT-4V(ision) to formulate a global-local physical property reasoning module for 2D images. Then we project the physical properties from multi-view 2D images to 3D Gaussians using a voting strategy. We demonstrate that 3D Gaussians with physical property annotations enable applications in physics-based dynamic simulation and robotic grasping. For physics-based dynamic simulation, we leverage the Material Point Method (MPM) for realistic dynamic simulation. For robot grasping, we develop a grasping force prediction strategy that estimates a safe force range required for object grasping based on the estimated physical properties. Extensive experiments on material segmentation, physics-based dynamic simulation, and robotic grasping validate the effectiveness of our proposed method, highlighting its crucial role in understanding physical properties from visual data. Online demo, code, more cases and annotated datasets are available on \href{https://Gaussian-Property.github.io}{this https URL}.

large language model, natural language, physical property, (18 more...)

arXiv.org Artificial Intelligence

2412.11258

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.70)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

MSI-NeRF: Linking Omni-Depth with View Synthesis through Multi-Sphere Image aided Generalizable Neural Radiance Field

Yan, Dongyu, Huang, Guanyu, Quan, Fengyu, Chen, Haoyao

arXiv.org Artificial IntelligenceMar-16-2024

Panoramic observation using fisheye cameras is significant in robot perception, reconstruction, and remote operation. However, panoramic images synthesized by traditional methods lack depth information and can only provide three degrees-of-freedom (3DoF) rotation rendering in virtual reality applications. To fully preserve and exploit the parallax information within the original fisheye cameras, we introduce MSI-NeRF, which combines deep learning omnidirectional depth estimation and novel view rendering. We first construct a multi-sphere image as a cost volume through feature extraction and warping of the input images. It is then processed by geometry and appearance decoders, respectively. Unlike methods that regress depth maps directly, we further build an implicit radiance field using spatial points and interpolated 3D feature vectors as input. In this way, we can simultaneously realize omnidirectional depth estimation and 6DoF view synthesis. Our method is trained in a semi-self-supervised manner. It does not require target view images and only uses depth data for supervision. Our network has the generalization ability to reconstruct unknown scenes efficiently using only four images. Experimental results show that our method outperforms existing methods in depth estimation and novel view synthesis tasks.

artificial intelligence, machine learning, representation, (20 more...)

arXiv.org Artificial Intelligence

2403.1084

Country: Asia > China (0.28)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Vision > Image Understanding (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Active Implicit Object Reconstruction using Uncertainty-guided Next-Best-View Optimization

Yan, Dongyu, Liu, Jianheng, Quan, Fengyu, Chen, Haoyao, Fu, Mengmeng

arXiv.org Artificial IntelligenceAug-3-2023

Actively planning sensor views during object reconstruction is crucial for autonomous mobile robots. An effective method should be able to strike a balance between accuracy and efficiency. In this paper, we propose a seamless integration of the emerging implicit representation with the active reconstruction task. We build an implicit occupancy field as our geometry proxy. While training, the prior object bounding box is utilized as auxiliary information to generate clean and detailed reconstructions. To evaluate view uncertainty, we employ a sampling-based approach that directly extracts entropy from the reconstructed occupancy probability field as our measure of view information gain. This eliminates the need for additional uncertainty maps or learning. Unlike previous methods that compare view uncertainty within a finite set of candidates, we aim to find the next-best-view (NBV) on a continuous manifold. Leveraging the differentiability of the implicit representation, the NBV can be optimized directly by maximizing the view uncertainty using gradient descent. It significantly enhances the method's adaptability to different scenarios. Simulation and real-world experiments demonstrate that our approach effectively improves reconstruction accuracy and efficiency of view planning in active reconstruction tasks. The proposed system will open source at https://github.com/HITSZ-NRSL/ActiveImplicitRecon.git.

artificial intelligence, machine learning, reconstruction, (18 more...)

arXiv.org Artificial Intelligence

2303.16739

Country: Asia > China (0.29)

Genre: Research Report (0.50)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Vision (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Efficient Implicit Neural Reconstruction Using LiDAR

Yan, Dongyu, Lyu, Xiaoyang, Shi, Jieqi, Lin, Yi

arXiv.org Artificial IntelligenceFeb-28-2023

Modeling scene geometry using implicit neural representation has revealed its advantages in accuracy, flexibility, and low memory usage. Previous approaches have demonstrated impressive results using color or depth images but still have difficulty handling poor light conditions and large-scale scenes. Methods taking global point cloud as input require accurate registration and ground truth coordinate labels, which limits their application scenarios. In this paper, we propose a new method that uses sparse LiDAR point clouds and rough odometry to reconstruct fine-grained implicit occupancy field efficiently within a few minutes. We introduce a new loss function that supervises directly in 3D space without 2D rendering, avoiding information loss. We also manage to refine poses of input frames in an end-to-end manner, creating consistent geometry without global point cloud registration. As far as we know, our method is the first to reconstruct implicit scene representation from LiDAR-only input. Experiments on synthetic and real-world datasets, including indoor and outdoor scenes, prove that our method is effective, efficient, and accurate, obtaining comparable results with existing methods using dense input.

artificial intelligence, machine learning, reconstruction, (14 more...)

arXiv.org Artificial Intelligence

2302.14363

Country: Asia > China (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback