AITopics | Wang, Jilong

Collaborating Authors

Wang, Jilong

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real

Liu, Weiheng, Wan, Yuxuan, Wang, Jilong, Kuang, Yuxuan, Shi, Xuesong, Li, Haoran, Zhao, Dongbin, Zhang, Zhizheng, Wang, He

arXiv.org Artificial IntelligenceFeb-25-2025

Object fetching from cluttered shelves is an important capability for robots to assist humans in real-world scenarios. Achieving this task demands robotic behaviors that prioritize safety by minimizing disturbances to surrounding objects, an essential but highly challenging requirement due to restricted motion space, limited fields of view, and complex object dynamics. In this paper, we introduce FetchBot, a sim-to-real framework designed to enable zero-shot generalizable and safety-aware object fetching from cluttered shelves in real-world settings. To address data scarcity, we propose an efficient voxel-based method for generating diverse simulated cluttered shelf scenes at scale and train a dynamics-aware reinforcement learning (RL) policy to generate object fetching trajectories within these scenes. This RL policy, which leverages oracle information, is subsequently distilled into a vision-based policy for real-world deployment. Considering that sim-to-real discrepancies stem from texture variations mostly while from geometric dimensions rarely, we propose to adopt depth information estimated by full-fledged depth foundation models as the input for the vision-based policy to mitigate sim-to-real gap. To tackle the challenge of limited views, we design a novel architecture for learning multi-view representations, allowing for comprehensive encoding of cluttered shelf scenes. This enables FetchBot to effectively minimize collisions while fetching objects from varying positions and depths, ensuring robust and safety-aware operation. Both simulation and real-robot experiments demonstrate FetchBot's superior generalization ability, particularly in handling a broad range of real-world scenarios, includ

arxiv preprint arxiv, large language model, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2502.17894

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Watch Less, Feel More: Sim-to-Real RL for Generalizable Articulated Object Manipulation via Motion Adaptation and Impedance Control

Do, Tan-Dzung, Gireesh, Nandiraju, Wang, Jilong, Wang, He

arXiv.org Artificial IntelligenceFeb-20-2025

Watch Less, Feel More: Sim-to-Real RL for Generalizable Articulated Object Manipulation via Motion Adaptation and Impedance Control Tan-Dzung Do 1,2, Nandiraju Gireesh 1,2, Jilong Wang 2, and He Wang 1,2, Figure 1: We train an RL policy to open doors and drawers in simulation that adapts its action according to the motion of objects by leveraging history observations (left). We directly transfer this policy to reach 80% joint limit in the real world with closed-loop variable impedance control and achieve 84% success rate, using only one first-frame RGBD image (right). Abstract -- Articulated object manipulation poses a unique challenge compared to rigid object manipulation as the object itself represents a dynamic environment. In this work, we present a novel RL-based pipeline equipped with variable impedance control and motion adaptation leveraging observation history for generalizable articulated object manipulation, focusing on smooth and dexterous motion during zero-shot sim-to-real transfer (Figure 1). T o mitigate the sim-to-real gap, our pipeline diminishes reliance on vision by not leveraging the vision data feature (RGBD/pointcloud) directly as policy input but rather extracting useful low-dimensional data first via off-the-shelf modules. Additionally, we experience less sim-to-real gap by inferring object motion and its intrinsic properties via observation history as well as utilizing impedance control both in the simulation and in the real world. Furthermore, we develop a well-designed training setting with great randomization and a specialized reward system (task-aware and motion-aware) that enables multi-staged, end-to-end manipulation without heuristic motion planning.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.14457

Genre: Research Report (0.50)

Industry: Energy (0.36)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.34)

Add feedback

QuadWBG: Generalizable Quadrupedal Whole-Body Grasping

Wang, Jilong, Rajabov, Javokhirbek, Xu, Chaoyi, Zheng, Yiming, Wang, He

arXiv.org Artificial IntelligenceJan-13-2025

Legged robots with advanced manipulation capabilities have the potential to significantly improve household duties and urban maintenance. Despite considerable progress in developing robust locomotion and precise manipulation methods, seamlessly integrating these into cohesive whole-body control for real-world applications remains challenging. In this paper, we present a modular framework for robust and generalizable whole-body loco-manipulation controller based on a single arm-mounted camera. By using reinforcement learning (RL), we enable a robust low-level policy for command execution over 5 dimensions (5D) and a grasp-aware high-level policy guided by a novel metric, Generalized Oriented Reachability Map (GORM). The proposed system achieves state-of-the-art one-time grasping accuracy of 89% in the real world, including challenging tasks such as grasping transparent objects. Through extensive simulations and real-world experiments, we demonstrate that our system can effectively manage a large workspace, from floor level to above body height, and perform diverse whole-body loco-manipulation tasks.

artificial intelligence, locomotion, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2411.06782

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (0.50)

Industry: Energy (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Robots > Locomotion (0.49)

Add feedback

MobileH2R: Learning Generalizable Human to Mobile Robot Handover Exclusively from Scalable and Diverse Synthetic Data

Wang, Zifan, Chen, Ziqing, Chen, Junyu, Wang, Jilong, Yang, Yuxin, Liu, Yunze, Liu, Xueyi, Wang, He, Yi, Li

arXiv.org Artificial IntelligenceJan-8-2025

This paper introduces MobileH2R, a framework for learning generalizable vision-based human-to-mobile-robot (H2MR) handover skills. Unlike traditional fixed-base handovers, this task requires a mobile robot to reliably receive objects in a large workspace enabled by its mobility. Our key insight is that generalizable handover skills can be developed in simulators using high-quality synthetic data, without the need for real-world demonstrations. To achieve this, we propose a scalable pipeline for generating diverse synthetic full-body human motion data, an automated method for creating safe and imitation-friendly demonstrations, and an efficient 4D imitation learning method for distilling large-scale demonstrations into closed-loop policies with base-arm coordination. Experimental evaluations in both simulators and the real world show significant improvements (at least +15% success rate) over baseline methods in all cases. Experiments also validate that large-scale and diverse synthetic data greatly enhances robot learning, highlighting our scalable framework.

artificial intelligence, demonstration, robot, (17 more...)

arXiv.org Artificial Intelligence

2501.04595

Country: Asia (0.14)

Genre: Research Report > Promising Solution (0.46)

Industry: Energy (0.48)

Technology: Information Technology > Artificial Intelligence > Robots > Locomotion (0.82)

Add feedback

GAMMA: Graspability-Aware Mobile MAnipulation Policy Learning based on Online Grasping Pose Fusion

Zhang, Jiazhao, Gireesh, Nandiraju, Wang, Jilong, Fang, Xiaomeng, Xu, Chaoyi, Chen, Weiguang, Dai, Liu, Wang, He

arXiv.org Artificial IntelligenceSep-27-2023

Mobile manipulation constitutes a fundamental task for robotic assistants and garners significant attention within the robotics community. A critical challenge inherent in mobile manipulation is the effective observation of the target while approaching it for grasping. In this work, we propose a graspability-aware mobile manipulation approach powered by an online grasping pose fusion framework that enables a temporally consistent grasping observation. Specifically, the predicted grasping poses are online organized to eliminate the redundant, outlier grasping poses, which can be encoded as a grasping pose observation state for reinforcement learning. Moreover, on-the-fly fusing the grasping poses enables a direct assessment of graspability, encompassing both the quantity and quality of grasping poses.

artificial intelligence, graspability-aware mobile manipulation policy learning, online grasping pose fusion, (1 more...)

arXiv.org Artificial Intelligence

2309.15459

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.69)

Add feedback