Goto

Collaborating Authors

 Hsu, Cheng-Chun


SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation

arXiv.org Artificial Intelligence

We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations, as well as cross-embodiment generalization. Additionally, object pose trajectories inherently capture planning constraints from demonstrations without the need for manually crafted rules. To guide the robot in executing the task, the object trajectory is used to condition a diffusion policy. We show improvement compared to prior work on RLBench simulated tasks. In real-world evaluation, using only eight demonstrations shot on an iPhone, our approach completed all tasks while fully complying with task constraints. Project page: https://nvlabs.github.io/object_centric_diffusion


KinScene: Model-Based Mobile Manipulation of Articulated Scenes

arXiv.org Artificial Intelligence

Sequentially interacting with articulated objects is crucial for a mobile manipulator to operate effectively in everyday environments. To enable long-horizon tasks involving articulated objects, this study explores building scene-level articulation models for indoor scenes through autonomous exploration. While previous research has studied mobile manipulation with articulated objects by considering object kinematic constraints, it primarily focuses on individual-object scenarios and lacks extension to a scene-level context for task-level planning. To manipulate multiple object parts sequentially, the robot needs to reason about the resultant motion of each part and anticipate its impact on future actions. We introduce KinScene, a full-stack approach for long-horizon manipulation tasks with articulated objects. The robot maps the scene, detects and physically interacts with articulated objects, collects observations, and infers the articulation properties. For sequential tasks, the robot plans a feasible series of object interactions based on the inferred articulation model. We demonstrate that our approach repeatably constructs accurate scene-level kinematic and geometric models, enabling long-horizon mobile manipulation in a real-world scene. Code and additional results are available at https://chengchunhsu.github.io/KinScene/


Ditto in the House: Building Articulation Models of Indoor Scenes through Interactive Perception

arXiv.org Artificial Intelligence

Abstract-- Virtualizing the physical world into virtual models has been a critical technique for robot navigation and planning in the real world. We introduce an interactive perception approach to this task. After that, the robot collects the observations before and after the interactions. Virtualizing the real world into virtual models is a crucial primarily focus on individual objects, whereas scaling to step for robots to operate in everyday environments. Intelligent room-sized environments requires the robot to efficiently and robots rely on these models to understand the surroundings effectively explore the large-scale 3D space for meaningful and plan their actions in unstructured scenes. The robot discovers and facilitate mobile robots to localize themselves and navigate physically interacts with the articulated objects in the environment. Nevertheless, real-world manipulation would require Based on the visual observations before and after a robot to depart from reconstructing a static scene to the interactions, the robot infers the articulation properties unraveling the physical properties of objects.