Goto

Collaborating Authors

 footstep


HuMam: Humanoid Motion Control via End-to-End Deep Reinforcement Learning with Mamba

Wang, Yinuo, Qi, Yuanyang, Zhou, Jinzhao, Tao, Gavin

arXiv.org Artificial Intelligence

Abstract--End-to-end reinforcement learning (RL) for humanoid locomotion is appealing for its compact perception-action mapping, yet practical policies often suffer from training instability, inefficient feature fusion, and high actuation cost. We present HuMam, a state-centric end-to-end RL framework that employs a single-layer Mamba encoder to fuse robot-centric states with oriented footstep targets and a continuous phase clock. The policy outputs joint position targets tracked by a low-level PD loop and is optimized with PPO. On the JVRC-1 humanoid in mc-mujoco, HuMam consistently improves learning efficiency, training stability, and overall task performance over a strong feedforward baseline, while reducing power consumption and torque peaks. T o our knowledge, this is the first end-to-end humanoid RL controller that adopts Mamba as the fusion backbone, demonstrating tangible gains in efficiency, stability, and control economy. UMANOID locomotion demands controllers that are both foresightful and resource-aware: foresightful to coordinate accurate foot placement and whole-body balance, and resource-aware to run reliably under onboard compute and actuation limits [1]. End-to-end reinforcement learning (RL) is attractive because it can discover feedback strategies directly from interaction [2]; however, its effectiveness hinges on (i) how heterogeneous inputs are fused and (ii) how training is shaped to avoid trivial or unstable behaviors.


Application of an attention-based CNN-BiLSTM framework for in vivo two-photon calcium imaging of neuronal ensembles: decoding complex bilateral forelimb movements from unilateral M1

Mirzaee, Ghazal, Chang, Jonathan, Latifi, Shahrzad

arXiv.org Artificial Intelligence

Decoding behavior, such as movement, from multiscale brain networks remains a central objective in neuroscience. Over the past decades, artificial intelligence and machine learning have played an increasingly significant role in elucidating the neural mechanisms underlying motor function. The advancement of brain-monitoring technologies, capable of capturing complex neuronal signals with high spatial and temporal resolution, necessitates the development and application of more sophisticated machine learning models for behavioral decoding. In this study, we employ a hybrid deep learning framework, an attention-based CNN-BiLSTM model, to decode skilled and complex forelimb movements using signals obtained from in vivo two-photon calcium imaging. Our findings demonstrate that the intricate movements of both ipsilateral and contralateral forelimbs can be accurately decoded from unilateral M1 neuronal ensembles. These results highlight the efficacy of advanced hybrid deep learning models in capturing the spatiotemporal dependencies of neuronal networks activity linked to complex movement execution.


UNB StepUP: A footStep database for gait analysis and recognition using Underfoot Pressure

Larracy, Robyn, Phinyomark, Angkoon, Salehi, Ala, MacDonald, Eve, Kazemi, Saeed, Bashar, Shikder Shafiul, Tabor, Aaron, Scheme, Erik

arXiv.org Artificial Intelligence

Gait refers to the patterns of limb movement generated during walking, which are unique to each individual due to both physical and behavioural traits. Walking patterns have been widely studied in biometrics, biomechanics, sports, and rehabilitation. While traditional methods rely on video and motion capture, advances in underfoot pressure sensing technology now offer deeper insights into gait. However, underfoot pressures during walking remain underexplored due to the lack of large, publicly accessible datasets. To address this, the UNB StepUP database was created, featuring gait pressure data collected with high-resolution pressure sensing tiles (4 sensors/cm$^2$, 1.2m by 3.6m). Its first release, UNB StepUP-P150, includes over 200,000 footsteps from 150 individuals across various walking speeds (preferred, slow-to-stop, fast, and slow) and footwear types (barefoot, standard shoes, and two personal shoes). As the largest and most comprehensive dataset of its kind, it supports biometric gait recognition while presenting new research opportunities in biomechanics and deep learning. The UNB StepUP-P150 dataset sets a new benchmark for pressure-based gait analysis and recognition. Please note that the hypertext links to the dataset on FigShare remain dormant while the document is under review.


How a School Shooting Became a Video Game

The New Yorker

The Final Exam, a recently released video game in which you play as a student caught amid a school shooting, lasts for around ten minutes, about the length of a real shooting event in a U.S. school. The game opens in an empty locker room. You hear distant gunfire, screams, harried footsteps, and the thudding of heavy furniture being overturned. The sense of disharmony is immediate: a familiar scene of youth and learning is grimly debased into one of peril. As the lockers surround you, their doors gaping, you feel caged: get me out of here. Moments later, as you enter the gymnasium, a two-minute countdown flashes on screen.


A Behavior Architecture for Fast Humanoid Robot Door Traversals

Calvert, Duncan, Penco, Luigi, Anderson, Dexton, Bialek, Tomasz, Chatterjee, Arghya, Mishra, Bhavyansh, Clark, Geoffrey, Bertrand, Sylvain, Griffin, Robert

arXiv.org Artificial Intelligence

Towards the role of humanoid robots as squad mates in urban operations and other domains, we identified doors as a major area lacking capability development. In this paper, we focus on the ability of humanoid robots to navigate and deal with doors. Human-sized doors are ubiquitous in many environment domains and the humanoid form factor is uniquely suited to operate and traverse them. We present an architecture which incorporates GPU accelerated perception and a tree based interactive behavior coordination system with a whole body motion and walking controller. Our system is capable of performing door traversals on a variety of door types. It supports rapid authoring of behaviors for unseen door types and techniques to achieve re-usability of those authored behaviors. The behaviors are modelled using trees and feature logical reactivity and action sequences that can be executed with layered concurrency to increase speed. Primitive actions are built on top of our existing whole body controller which supports manipulation while walking. We include a perception system using both neural networks and classical computer vision for door mechanism detection outside of the lab environment. We present operator-robot interdependence analysis charts to explore how human cognition is combined with artificial intelligence to produce complex robot behavior. Finally, we present and discuss real robot performances of fast door traversals on our Nadia humanoid robot. Videos online at https://www.youtube.com/playlist?list=PLXuyT8w3JVgMPaB5nWNRNHtqzRK8i68dy.


Online DNN-driven Nonlinear MPC for Stylistic Humanoid Robot Walking with Step Adjustment

Romualdi, Giulio, Viceconte, Paolo Maria, Moretti, Lorenzo, Sorrentino, Ines, Dafarra, Stefano, Traversaro, Silvio, Pucci, Daniele

arXiv.org Artificial Intelligence

This paper presents a three-layered architecture that enables stylistic locomotion with online contact location adjustment. Our method combines an autoregressive Deep Neural Network (DNN) acting as a trajectory generation layer with a model-based trajectory adjustment and trajectory control layers. The DNN produces centroidal and postural references serving as an initial guess and regularizer for the other layers. Being the DNN trained on human motion capture data, the resulting robot motion exhibits locomotion patterns, resembling a human walking style. The trajectory adjustment layer utilizes non-linear optimization to ensure dynamically feasible center of mass (CoM) motion while addressing step adjustments. We compare two implementations of the trajectory adjustment layer: one as a receding horizon planner (RHP) and the other as a model predictive controller (MPC). To enhance MPC performance, we introduce a Kalman filter to reduce measurement noise. The filter parameters are automatically tuned with a Genetic Algorithm. Experimental results on the ergoCub humanoid robot demonstrate the system's ability to prevent falls, replicate human walking styles, and withstand disturbances up to 68 Newton. Website: https://sites.google.com/view/dnn-mpc-walking Youtube video: https://www.youtube.com/watch?v=x3tzEfxO-xQ


iWalker: Imperative Visual Planning for Walking Humanoid Robot

Lin, Xiao, Huang, Yuhao, Fu, Taimeng, Xiong, Xiaobin, Wang, Chen

arXiv.org Artificial Intelligence

Humanoid robots, with the potential to perform a broad range of tasks in environments designed for humans, have been deemed crucial for the basis of general AI agents. When talking about planning and controlling, although traditional models and task-specific methods have been extensively studied over the past few decades, they are inadequate for achieving the flexibility and versatility needed for general autonomy. Learning approaches, especially reinforcement learning, are powerful and popular nowadays, but they are inherently "blind" during training, relying heavily on trials in simulation without proper guidance from physical principles or underlying dynamics. In response, we propose a novel end-to-end pipeline that seamlessly integrates perception, planning, and model-based control for humanoid robot walking. We refer to our method as iWalker, which is driven by imperative learning (IL), a self-supervising neuro-symbolic learning framework. This enables the robot to learn from arbitrary unlabeled data, significantly improving its adaptability and generalization capabilities. In experiments, iWalker demonstrates effectiveness in both simulated and real-world environments, representing a significant advancement toward versatile and autonomous humanoid robots.


High-Speed and Impact Resilient Teleoperation of Humanoid Robots

Bertrand, Sylvain, Penco, Luigi, Anderson, Dexton, Calvert, Duncan, Roy, Valentine, McCrory, Stephen, Mohammed, Khizar, Sanchez, Sebastian, Griffith, Will, Morfey, Steve, Maslyczyk, Alexis, Mohan, Achintya, Castello, Cody, Ma, Bingyin, Suryavanshi, Kartik, Dills, Patrick, Pratt, Jerry, Ragusila, Victor, Shrewsbury, Brandon, Griffin, Robert

arXiv.org Artificial Intelligence

Teleoperation of humanoid robots has long been a challenging domain, necessitating advances in both hardware and software to achieve seamless and intuitive control. This paper presents an integrated solution based on several elements: calibration-free motion capture and retargeting, low-latency fast whole-body kinematics streaming toolbox and high-bandwidth cycloidal actuators. Our motion retargeting approach stands out for its simplicity, requiring only 7 IMUs to generate full-body references for the robot. The kinematics streaming toolbox, ensures real-time, responsive control of the robot's movements, significantly reducing latency and enhancing operational efficiency. Additionally, the use of cycloidal actuators makes it possible to withstand high speeds and impacts with the environment. Together, these approaches contribute to a teleoperation framework that offers unprecedented performance. Experimental results on the humanoid robot Nadia demonstrate the effectiveness of the integrated system.


RL-augmented MPC Framework for Agile and Robust Bipedal Footstep Locomotion Planning and Control

Bang, Seung Hyeon, Jové, Carlos Arribalzaga, Sentis, Luis

arXiv.org Artificial Intelligence

RL-augmented MPC Framework for Agile and Robust Bipedal Footstep Locomotion Planning and Control Seung Hyeon Bang 1, Carlos Arribalzaga Jov e 1, 2, and Luis Sentis 1 Abstract -- This paper proposes an online bipedal footstep planning strategy that combines model predictive control (MPC) and reinforcement learning (RL) to achieve agile and robust bipedal maneuvers. While MPC-based foot placement controllers have demonstrated their effectiveness in achieving dynamic locomotion, their performance is often limited by the use of simplified models and assumptions. T o address this challenge, we develop a novel foot placement controller that leverages a learned policy to bridge the gap between the use of a simplified model and the more complex full-order robot system. Specifically, our approach employs a unique combination of an ALIP-based MPC foot placement controller for sub-optimal footstep planning and the learned policy for refining footstep adjustments, enabling the resulting footstep policy to capture the robot's whole-body dynamics effectively. We validate the effectiveness of our framework through a series of experiments using the full-body humanoid robot DRACO 3. The results demonstrate significant improvements in dynamic locomotion performance, including better tracking of a wide range of walking speeds, enabling reliable turning and traversing challenging terrains while preserving the robustness and stability of the walking gaits compared to the baseline ALIP-based MPC approach. I. INTRODUCTION Agile and robust bipedal locomotion is essential for humanoid robots to achieve human-level performance. One of the main challenges in achieving this is designing a footstep policy that enables bipeds to constantly adjust their planned footstep positions to maintain balance as well as to achieve more agile and fast maneuvers, even while traversing adverse environments, such as external disturbances or challenging terrains. In this paper, we present an RL-augmented MPC framework designed to generate a footstep policy for agile and robust bipedal locomotion.


Learning Semantic Traversability with Egocentric Video and Automated Annotation Strategy

Kim, Yunho, Lee, Jeong Hyun, Lee, Choongin, Mun, Juhyeok, Youm, Donghoon, Park, Jeongsoo, Hwangbo, Jemin

arXiv.org Artificial Intelligence

For reliable autonomous robot navigation in urban settings, the robot must have the ability to identify semantically traversable terrains in the image based on the semantic understanding of the scene. This reasoning ability is based on semantic traversability, which is frequently achieved using semantic segmentation models fine-tuned on the testing domain. This fine-tuning process often involves manual data collection with the target robot and annotation by human labelers which is prohibitively expensive and unscalable. In this work, we present an effective methodology for training a semantic traversability estimator using egocentric videos and an automated annotation process. Egocentric videos are collected from a camera mounted on a pedestrian's chest. The dataset for training the semantic traversability estimator is then automatically generated by extracting semantically traversable regions in each video frame using a recent foundation model in image segmentation and its prompting technique. Extensive experiments with videos taken across several countries and cities, covering diverse urban scenarios, demonstrate the high scalability and generalizability of the proposed annotation method. Furthermore, performance analysis and real-world deployment for autonomous robot navigation showcase that the trained semantic traversability estimator is highly accurate, able to handle diverse camera viewpoints, computationally light, and real-world applicable. The summary video is available at https://youtu.be/EUVoH-wA-lA.