HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba

Wang, Yinuo, Qi, Yuanyang, Zhou, Jinzhao, Tao, Gavin

arXiv.org Artificial Intelligence 

Abstract--End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. T o our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy. UMANOID locomotion demands controllers that are both foresightful and resource-aware: foresightful to coordinate accurate foot placement and whole-body balance, and resource-aware to run reliably under onboard compute and actuation limits [1]. End-to-end reinforcement learning (RL) is attractive because it can discover feedback strategies directly from interaction [2]; however, its effectiveness hinges on (i) how heterogeneous inputs are fused and (ii) how training is shaped to avoid trivial or unstable behaviors.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found