AITopics | Li, Yishu

Collaborating Authors

Li, Yishu

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FlowBotHD: History-Aware Diffuser Handling Ambiguities in Articulated Objects Manipulation

Li, Yishu, Leng, Wen Hui, Fang, Yiming, Eisner, Ben, Held, David

arXiv.org Artificial IntelligenceDec-28-2024

We introduce a novel approach for manipulating articulated objects which are visually ambiguous, such doors which are symmetric or which are heavily occluded. These ambiguities can cause uncertainty over different possible articulation modes: for instance, when the articulation direction (e.g. push, pull, slide) or location (e.g. left side, right side) of a fully closed door are uncertain, or when distinguishing features like the plane of the door are occluded due to the viewing angle. To tackle these challenges, we propose a history-aware diffusion network that can model multi-modal distributions over articulation modes for articulated objects; our method further uses observation history to distinguish between modes and make stable predictions under occlusions. Experiments and analysis demonstrate that our method achieves state-of-art performance on articulated object manipulation and dramatically improves performance for articulated objects containing visual ambiguities. Our project website is available at https://flowbothd.github.io/.

artificial intelligence, machine learning, prediction, (16 more...)

arXiv.org Artificial Intelligence

2410.07078

Country: North America > United States (0.46)

Genre:

Workflow (1.00)
Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

MonoPlane: Exploiting Monocular Geometric Cues for Generalizable 3D Plane Reconstruction

Zhao, Wang, Liu, Jiachen, Zhang, Sheng, Li, Yishu, Chen, Sili, Huang, Sharon X, Liu, Yong-Jin, Guo, Hengkai

arXiv.org Artificial IntelligenceNov-2-2024

This paper presents a generalizable 3D plane detection and reconstruction framework named MonoPlane. Unlike previous robust estimator-based works (which require multiple images or RGB-D input) and learning-based works (which suffer from domain shift), MonoPlane combines the best of two worlds and establishes a plane reconstruction pipeline based on monocular geometric cues, resulting in accurate, robust and scalable 3D plane detection and reconstruction in the wild. Specifically, we first leverage large-scale pre-trained neural networks to obtain the depth and surface normals from a single image. These monocular geometric cues are then incorporated into a proximity-guided RANSAC framework to sequentially fit each plane instance. We exploit effective 3D point proximity and model such proximity via a graph within RANSAC to guide the plane fitting from noisy monocular depths, followed by image-level multi-plane joint optimization to improve the consistency among all plane instances. We further design a simple but effective pipeline to extend this single-view solution to sparse-view 3D plane reconstruction. Extensive experiments on a list of datasets demonstrate our superior zero-shot generalizability over baselines, achieving state-of-the-art plane reconstruction performance in a transferring setting. Our code is available at https://github.com/thuzhaowang/MonoPlane .

artificial intelligence, machine learning, reconstruction, (17 more...)

arXiv.org Artificial Intelligence

2411.01226

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback