Sim-to-Real of Humanoid Locomotion Policies via Joint Torque Space Perturbation Injection

Cha, Woohyun, Cha, Junhyeok, Shin, Jaeyong, Kim, Donghyeon, Park, Jaeheung

arXiv.org Artificial Intelligence 

-- This paper proposes a novel alternative to existing sim-to-real methods for training control policies with simulated experiences. Prior sim-to-real methods for legged robots mostly rely on the domain randomization approach, where a fixed finite set of simulation parameters is randomized during training. Instead, our method adds state-dependent perturbations to the input joint torque used for forward simulation during the training phase. These state-dependent perturbations are designed to simulate a broader range of reality gaps than those captured by randomizing a fixed set of simulation parameters. Experimental results show that our method enables humanoid locomotion policies that achieve greater robustness against complex reality gaps unseen in the training domain. Deep Reinforcement Learning (DRL) for robotic applications has gained significant attention due to its demonstrated robustness and versatility. Although DRL algorithms are capable of solving complex, high-dimensional control problems, commonly used on-policy methods often require a prohibitively large amount of data, posing a substantial challenge when collecting sufficient samples solely from real hardware. Moreover, the exploration process required for policy improvement in early training stages can raise safety concerns for both the physical robot and its operational environment.