HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba
Wang, Yinuo, Qi, Yuanyang, Zhou, Jinzhao, Tao, Gavin
–arXiv.org Artificial Intelligence
Abstract--End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. T o our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy. UMANOID locomotion demands controllers that are both foresightful and resource-aware: foresightful to coordinate accurate foot placement and whole-body balance, and resource-aware to run reliably under onboard compute and actuation limits [1]. End-to-end reinforcement learning (RL) is attractive because it can discover feedback strategies directly from interaction [2]; however, its effectiveness hinges on (i) how heterogeneous inputs are fused and (ii) how training is shaped to avoid trivial or unstable behaviors.
arXiv.org Artificial Intelligence
Sep-23-2025