motor skill
BuilderBench -- A benchmark for generalist agents
Ghugare, Raj, Ji, Catherine, Wantlin, Kathryn, Schofield, Jin, Eysenbach, Benjamin
Today's AI models learn primarily through mimicry and sharpening, so it is not surprising that they struggle to solve problems beyond the limits set by existing data. To solve novel problems, agents should acquire skills for exploring and learning through experience. Finding a scalable learning mechanism for developing agents that learn through interaction remains a major open problem. In this work, we introduce BuilderBench, a benchmark to accelerate research into agent pre-training that centers open-ended exploration. BuilderBench requires agents to learn how to build any structure using blocks. BuilderBench is equipped with $(1)$ a hardware accelerated simulator of a robotic agent interacting with various physical blocks, and $(2)$ a task-suite with over 42 diverse target structures that are carefully curated to test an understanding of physics, mathematics, and long-horizon planning. During training, agents have to explore and learn general principles about the environment without any external supervision. During evaluation, agents have to build the unseen target structures from the task suite. Solving these tasks requires a sort of \emph{embodied reasoning} that is not reflected in words but rather in actions, experimenting with different strategies and piecing them together. Our experiments show that many of these tasks challenge the current iteration of algorithms. Hence, we also provide a ``training wheels'' protocol, in which agents are trained and evaluated to build a single target structure from the task suite. Finally, we provide single-file implementations of six different algorithms as a reference point for researchers.
Let Humanoids Hike! Integrative Skill Development on Complex Trails
Hiking on complex trails demands balance, agility, and adaptive decision-making over unpredictable terrain. Current humanoid research remains fragmented and inadequate for hiking: locomotion focuses on motor skills without long-term goals or situational awareness, while semantic navigation overlooks real-world embodiment and local terrain variability. We propose training humanoids to hike on complex trails, driving integrative skill development across visual perception, decision making, and motor execution. We develop a learning framework, LEGO-H, that enables a vision-equipped humanoid robot to hike complex trails autonomously. We introduce two technical innovations: 1) A temporal vision transformer variant - tailored into Hierarchical Reinforcement Learning framework - anticipates future local goals to guide movement, seamlessly integrating locomotion with goal-directed navigation. 2) Latent representations of joint movement patterns, combined with hierarchical metric learning - enhance Privileged Learning scheme - enable smooth policy transfer from privileged training to onboard execution. These components allow LEGO-H to handle diverse physical and environmental challenges without relying on predefined motion patterns. Experiments across varied simulated trails and robot morphologies highlight LEGO-H's versatility and robustness, positioning hiking as a compelling testbed for embodied autonomy and LEGO-H as a baseline for future humanoid development.
ModSkill: Physical Character Skill Modularization
Huang, Yiming, Dou, Zhiyang, Liu, Lingjie
Human motion is highly diverse and dynamic, posing challenges for imitation learning algorithms that aim to generalize motor skills for controlling simulated characters. Previous methods typically rely on a universal full-body controller for tracking reference motion (tracking-based model) or a unified full-body skill embedding space (skill embedding). However, these approaches often struggle to generalize and scale to larger motion datasets. In this work, we introduce a novel skill learning framework, ModSkill, that decouples complex full-body skills into compositional, modular skills for independent body parts. Our framework features a skill modularization attention layer that processes policy observations into modular skill embed-dings that guide low-level controllers for each body part. W e also propose an Active Skill Learning approach with Generative Adaptive Sampling, using large motion generation models to adaptively enhance policy learning in challenging tracking scenarios. Our results show that this modularized skill learning framework, enhanced by generative sampling, outperforms existing methods in precise full-body motion tracking and enables reusable skill embed-dings for diverse goal-driven tasks.
Using Machine Teaching to Boost Novices' Robot Teaching Skill
Zhu, Yuqing, Sun, Endong, Howard, Matthew
Using Machine Teaching to Boost Novices' Robot Teaching Skill Abstract-- Recent evidence has shown that, contrary to expectations, it is difficult for users, especially novices, to teach robots tasks through learning from demonstration (LfD). This paper introduces a framework that leverages machine teaching algorithms to train novices to become better teachers of robots, and verifies whether such teaching ability is (i) retained beyond the period of training and (ii) generalises such that novices teach robots more effectively, even for skills for which training has not been received. A between-subjects study is reported, in which novice teachers are asked to teach simple motor skills to a robot. The results demonstrate that subjects that receive training show average 78.83% improvement in teaching ability (as measured by accuracy of the skill learnt by the robot), and average 63.69% improvement in the teaching of new skills not included as part of the training. The proposed approach allows Robot learning from demonstration (LfD) is a technology human teachers to be trained to teach robot dynamic motor that enables robots to learn tasks by observing and imitating skills using machine teaching.
ATLAS: Improving Lay Summarisation with Attribute-based Control
Zhang, Zhihao, Goldsack, Tomas, Scarton, Carolina, Lin, Chenghua
Lay summarisation aims to produce summaries of scientific articles that are comprehensible to non-expert audiences. However, previous work assumes a one-size-fits-all approach, where the content and style of the produced summary are entirely dependent on the data used to train the model. In practice, audiences with different levels of expertise will have specific needs, impacting what content should appear in a lay summary and how it should be presented. Aiming to address this, we propose ATLAS, a novel abstractive summarisation approach that can control various properties that contribute to the overall "layness" of the generated summary using targeted control attributes. We evaluate ATLAS on a combination of biomedical lay summarisation datasets, where it outperforms state-of-the-art baselines using mainstream summarisation metrics. Additional analyses provided on the discriminatory power and emergent influence of our selected controllable attributes further attest to the effectiveness of our approach.
Dexterous Legged Locomotion in Confined 3D Spaces with Reinforcement Learning
Xu, Zifan, Raj, Amir Hossain, Xiao, Xuesu, Stone, Peter
Recent advances of locomotion controllers utilizing deep reinforcement learning (RL) have yielded impressive results in terms of achieving rapid and robust locomotion across challenging terrain, such as rugged rocks, non-rigid ground, and slippery surfaces. However, while these controllers primarily address challenges underneath the robot, relatively little research has investigated legged mobility through confined 3D spaces, such as narrow tunnels or irregular voids, which impose all-around constraints. The cyclic gait patterns resulted from existing RL-based methods to learn parameterized locomotion skills characterized by motion parameters, such as velocity and body height, may not be adequate to navigate robots through challenging confined 3D spaces, requiring both agile 3D obstacle avoidance and robust legged locomotion. Instead, we propose to learn locomotion skills end-to-end from goal-oriented navigation in confined 3D spaces. To address the inefficiency of tracking distant navigation goals, we introduce a hierarchical locomotion controller that combines a classical planner tasked with planning waypoints to reach a faraway global goal location, and an RL-based policy trained to follow these waypoints by generating low-level motion commands. This approach allows the policy to explore its own locomotion skills within the entire solution space and facilitates smooth transitions between local goals, enabling long-term navigation towards distant goals. In simulation, our hierarchical approach succeeds at navigating through demanding confined 3D environments, outperforming both pure end-to-end learning approaches and parameterized locomotion skills. We further demonstrate the successful real-world deployment of our simulation-trained controller on a real robot.
Mimicking the Maestro: Exploring the Efficacy of a Virtual AI Teacher in Fine Motor Skill Acquisition
Mulian, Hadar, Shlomov, Segev, Limonad, Lior, Noccaro, Alessia, Buscaglione, Silvia
Motor skills, especially fine motor skills like handwriting, play an essential role in academic pursuits and everyday life. Traditional methods to teach these skills, although effective, can be time-consuming and inconsistent. With the rise of advanced technologies like robotics and artificial intelligence, there is increasing interest in automating such teaching processes using these technologies, via human-robot and human-computer interactions. In this study, we examine the potential of a virtual AI teacher in emulating the techniques of human educators for motor skill acquisition. We introduce an AI teacher model that captures the distinct characteristics of human instructors. Using a Reinforcement Learning environment tailored to mimic teacher-learner interactions, we tested our AI model against four guiding hypotheses, emphasizing improved learner performance, enhanced rate of skill acquisition, and reduced variability in learning outcomes. Our findings, validated on synthetic learners, revealed significant improvements across all tested hypotheses. Notably, our model showcased robustness across different learners and settings and demonstrated adaptability to handwriting. This research underscores the potential of integrating Reinforcement Learning and Imitation Learning models with robotics in revolutionizing the teaching of critical motor skills.
AI expert shares insights on creating robot with physical capabilities to beat humans in popular game
Fox News contributor Dr. Marc Siegel weighs in on how artificial intelligence can change the patient-doctor relationship on'America's Newsroom.' Artificial intelligence has been able to beat masters at games like chess and poker and Go. AI has also been able to beat human competitors in various video games. While impressive nonetheless, there is one major capability that these games do not require of the AI: physical skill. CyberRunner is an AI tasked with learning how to play the popular labyrinth maze game.
From Rolling Over to Walking: Enabling Humanoid Robots to Develop Complex Motor Skills
This paper presents an innovative method for humanoid robots to acquire a comprehensive set of motor skills through reinforcement learning. The approach utilizes an achievement-triggered multi-path reward function rooted in developmental robotics principles, facilitating the robot to learn gross motor skills typically mastered by human infants within a single training phase. The proposed method outperforms standard reinforcement learning techniques in success rates and learning speed within a simulation environment. By leveraging the principles of self-discovery and exploration integral to infant learning, this method holds the potential to significantly advance humanoid robot motor skill acquisition.
Universal Humanoid Motion Representations for Physics-Based Control
Luo, Zhengyi, Cao, Jinkun, Merel, Josh, Winkler, Alexander, Huang, Jing, Kitani, Kris, Xu, Weipeng
We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control. Due to the high-dimensionality of humanoid control as well as the inherent difficulties in reinforcement learning, prior methods have focused on learning skill embeddings for a narrow range of movement styles (e.g. locomotion, game characters) from specialized motion datasets. This limited scope hampers its applicability in complex tasks. Our work closes this gap, significantly increasing the coverage of motion representation space. To achieve this, we first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset. We then create our motion representation by distilling skills directly from the imitator. This is achieved using an encoder-decoder structure with a variational information bottleneck. Additionally, we jointly learn a prior conditioned on proprioception (humanoid's own pose and velocities) to improve model expressiveness and sampling efficiency for downstream tasks. Sampling from the prior, we can generate long, stable, and diverse human motions. Using this latent space for hierarchical RL, we show that our policies solve tasks using natural and realistic human behavior. We demonstrate the effectiveness of our motion representation by solving generative tasks (e.g. strike, terrain traversal) and motion tracking using VR controllers.