DreamControl: Human-Inspired Whole-Body Humanoid Control for Scene Interaction via Guided Diffusion

Kalaria, Dvij, Harithas, Sudarshan S, Katara, Pushkal, Kwak, Sangkyung, Bhagat, Sarthak, Sastry, Shankar, Sridhar, Srinath, Vemprala, Sai, Kapoor, Ashish, Huang, Jonathan Chung-Kuan

arXiv.org Artificial Intelligence 

Abstract-- We introduce DreamControl, a novel methodology for learning autonomous whole-body humanoid skills. DreamControl leverages the strengths of diffusion models and Reinforcement Learning (RL): our core innovation is the use of a diffusion prior trained on human motion data, which subsequently guides an RL policy in simulation to complete specific tasks of interest (e.g., opening a drawer or picking up an object). We demonstrate that this human motion-informed prior allows RL to discover solutions unattainable by direct RL, and that diffusion models inherently promote natural-looking motions, aiding in sim-to-real transfer . We validate DreamControl's effectiveness on a Unitree G1 robot across a diverse set of challenging tasks involving simultaneous lower and upper body control and object interaction. Significant advancements in humanoid robot control have been made in recent years, particularly in locomotion and motion tracking, leading to impressive demonstrations such as robot dancing [1], [2] and kung-fu [3]. However, for humanoid robots to transition from mere exhibitions to universal assistants, they must be able to interact with their environment by fully leveraging their humanoid form factor's mobility and extensive range of motion. This includes tasks such as stooping to pick up objects, squatting for heavy boxes, bracing to open drawers or doors, and precise pushing, punching, or kicking of specific targets. These tasks are sometimes referred to as whole-body manipulation and loco-manipulation tasks, and continue to pose substantial challenges for the humanoid robotics field. Existing approaches to humanoid manipulation often simplify the problem by fixing the lower body (e.g., [4]), training upper and lower bodies separately with the lower body reacting to the upper (e.g., [5]), or focusing exclusively on computer graphics applications (e.g., [6], [7]).

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found