Goto

Collaborating Authors

 He, Zhengmao


Learning Visual Quadrupedal Loco-Manipulation from Demonstrations

arXiv.org Artificial Intelligence

Quadruped robots are progressively being integrated into human environments. Despite the growing locomotion capabilities of quadrupedal robots, their interaction with objects in realistic scenes is still limited. While additional robotic arms on quadrupedal robots enable manipulating objects, they are sometimes redundant given that a quadruped robot is essentially a mobile unit equipped with four limbs, each possessing 3 degrees of freedom (DoFs). Hence, we aim to empower a quadruped robot to execute real-world manipulation tasks using only its legs. We decompose the loco-manipulation process into a low-level reinforcement learning (RL)-based controller and a high-level Behavior Cloning (BC)-based planner. By parameterizing the manipulation trajectory, we synchronize the efforts of the upper and lower layers, thereby leveraging the advantages of both RL and BC. Our approach is validated through simulations and real-world experiments, demonstrating the robot's ability to perform tasks that demand mobility and high precision, such as lifting a basket from the ground while moving, closing a dishwasher, pressing a button, and pushing a door. Project website: https://zhengmaohe.github.io/leg-manip


Uni-O4: Unifying Online and Offline Deep Reinforcement Learning with Multi-Step On-Policy Optimization

arXiv.org Artificial Intelligence

Combining offline and online reinforcement learning (RL) is crucial for efficient and safe learning. However, previous approaches treat offline and online learning as separate procedures, resulting in redundant designs and limited performance. We ask: Can we achieve straightforward yet effective offline and online learning without introducing extra conservatism or regularization? In this study, we propose Uni-O4, which utilizes an on-policy objective for both offline and online learning. Owning to the alignment of objectives in two phases, the RL agent can transfer between offline and online learning seamlessly. This property enhances the flexibility of the learning paradigm, allowing for arbitrary combinations of pretraining, fine-tuning, offline, and online learning. In the offline phase, specifically, Uni-O4 leverages diverse ensemble policies to address the mismatch issues between the estimated behavior policy and the offline dataset. Through a simple offline policy evaluation (OPE) approach, Uni-O4 can achieve multi-step policy improvement safely. We demonstrate that by employing the method above, the fusion of these two paradigms can yield superior offline initialization as well as stable and rapid online fine-tuning capabilities. Through real-world robot tasks, we highlight the benefits of this paradigm for rapid deployment in challenging, previously unseen real-world environments. Additionally, through comprehensive evaluations using numerous simulated benchmarks, we substantiate that our method achieves state-of-the-art performance in both offline and offline-to-online fine-tuning learning. Imagine a scenario where a reinforcement learning robot needs to function and improve itself in the real world, the policy of the robot might go through the pipeline of training online in a simulator, then offline with real-world data, and lastly online in the real world. However, current reinforcement learning algorithms usually focus on specific stages of learning, which sophisticates the effort to train robots with a single unified framework. Online RL algorithms require a substantial amount of interaction and exploration to attain strong performance, which is prohibitive in many real-world applications. Offline RL, in which agents learn from a fixed dataset generated by other behavior policies, is a potential solution.


ArrayBot: Reinforcement Learning for Generalizable Distributed Manipulation through Touch

arXiv.org Artificial Intelligence

The notion of robotic manipulation [1, 2] easily invokes the image of a biomimetic robot arm or hand trying to grasp tabletop objects and then rearrange them into desired configurations inferred by exteroceptive sensors such as RGBD cameras. To facilitate this manipulation pipeline, the robot learning community has made tremendous efforts in either how to determine steadier grasping poses in demanding scenarios [3, 4, 5, 6, 7] or how to understand the exteroceptive inputs in a more robust and generalizable way [8, 9, 10, 11, 12, 13]. Acknowledging these progresses, this paper attempts to bypass the challenges in the prevailing pipeline by advocating ArrayBot, a reinforcement learning driven system for distributed manipulation [14], where the objects are manipulated through a great number of actuators with only proprioceptive tactile sensing [15, 16, 17, 18]. Conceptually, the hardware of ArrayBot is a 16 16 array of vertically sliding pillars, each of which can be independently actuated, leading to a 16 16 action space. Functionally, the actuators beneath a tabletop object can support its weight and at the same time cooperate to lift, tilt, and even translate it through proper motion policies.