goal region
The MAGICAL Benchmark for Robust Imitation
The robot could learn from these demonstrations to complete the tasks autonomously. For IL algorithms to be useful, however, they must be able to learn how to perform tasks from few demonstrations. A domestic robot wouldn't be very helpful if it required thirty demonstrations before it figured out that you are deliberately washing your purple cravat
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Asia > Singapore (0.04)
LPPG-RL: Lexicographically Projected Policy Gradient Reinforcement Learning with Subproblem Exploration
Qiu, Ruiyu, Wang, Rui, Yang, Guanghui, Li, Xiang, Shao, Zhijiang
Lexicographic multi-objective problems, which consist of multiple conflicting subtasks with explicit priorities, are common in real-world applications. Despite the advantages of Reinforcement Learning (RL) in single tasks, extending conventional RL methods to prioritized multiple objectives remains challenging. In particular, traditional Safe RL and Multi-Objective RL (MORL) methods have difficulty enforcing priority orderings efficiently. Therefore, Lexicographic Multi-Objective RL (LMORL) methods have been developed to address these challenges. However, existing LMORL methods either rely on heuristic threshold tuning with prior knowledge or are restricted to discrete domains. To overcome these limitations, we propose Lexicographically Projected Policy Gradient RL (LPPG-RL), a novel LMORL framework which leverages sequential gradient projections to identify feasible policy update directions, thereby enabling LPPG-RL broadly compatible with all policy gradient algorithms in continuous spaces. LPPG-RL reformulates the projection step as an optimization problem, and utilizes Dykstra's projection rather than generic solvers to deliver great speedups, especially for small- to medium-scale instances. In addition, LPPG-RL introduces Subproblem Exploration (SE) to prevent gradient vanishing, accelerate convergence and enhance stability. We provide theoretical guarantees for convergence and establish a lower bound on policy improvement. Finally, through extensive experiments in a 2D navigation environment, we demonstrate the effectiveness of LPPG-RL, showing that it outperforms existing state-of-the-art continuous LMORL methods.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- Asia > China > Zhejiang Province > Ningbo (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Zhejiang Province > Hangzhou (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Switzerland > Zürich > Zürich (0.04)
- Asia > Singapore (0.04)
Moving Out: Physically-grounded Human-AI Collaboration
Kang, Xuhui, Lee, Sung-Wook, Liu, Haolin, Wang, Yuyan, Kuo, Yen-Ling
The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. In this paper, we introduce Moving Out, a new human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and maintaining consistent actions to move a big item around a corner. Using Moving Out, we designed two tasks and collected human-human interaction data to evaluate models' abilities to adapt to diverse human behaviors and unseen physical attributes. To address the challenges in physical environments, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. Our experiments show that BASS outperforms state-of-the-art models in AI-AI and human-AI collaboration. The project page is available at https://live-robotics-uva.github.io/movingout_ai/.
A Dynamical Systems Framework for Reinforcement Learning Safety and Robustness Verification
Nasir, Ahmed, Zenati, Abdelhafid
The application of reinforcement learning to safety-critical systems is limited by the lack of formal methods for verifying the robustness and safety of learned policies. This paper introduces a novel framework that addresses this gap by analyzing the combination of an RL agent and its environment as a discrete-time autonomous dynamical system. By leveraging tools from dynamical systems theory, specifically the Finite-Time Lyapunov Exponent (FTLE), we identify and visualize Lagrangian Coherent Structures (LCS) that act as the hidden "skeleton" governing the system's behavior. We demonstrate that repelling LCS function as safety barriers around unsafe regions, while attracting LCS reveal the system's convergence properties and potential failure modes, such as unintended "trap" states. To move beyond qualitative visualization, we introduce a suite of quantitative metrics, Mean Boundary Repulsion (MBR), Aggregated Spurious Attractor Strength (ASAS), and Temporally-Aware Spurious Attractor Strength (TASAS), to formally measure a policy's safety margin and robustness. We further provide a method for deriving local stability guarantees and extend the analysis to handle model uncertainty. Through experiments in both discrete and continuous control environments, we show that this framework provides a comprehensive and interpretable assessment of policy behavior, successfully identifying critical flaws in policies that appear successful based on reward alone.
The MAGICAL Benchmark for Robust Imitation
The robot could learn from these demonstrations to complete the tasks autonomously. For IL algorithms to be useful, however, they must be able to learn how to perform tasks from few demonstrations. A domestic robot wouldn't be very helpful if it required thirty demonstrations before it figured out that you are deliberately washing your purple cravat
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.71)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Multi-Robot Cooperative Herding through Backstepping Control Barrier Functions
Li, Kang, Li, Ming, Ji, Wenkang, Sun, Zhiyong, Zhao, Shiyu
We propose a novel cooperative herding strategy through backstepping control barrier functions (CBFs), which coordinates multiple herders to herd a group of evaders safely towards a designated goal region. For the herding system with heterogeneous groups involving herders and evaders, the behavior of the evaders can only be influenced indirectly by the herders' motion, especially when the evaders follow an inverse dynamics model and respond solely to repulsive interactions from the herders. This indirect interaction mechanism inherently renders the overall system underactuated. To address this issue, we first construct separate CBFs for the dual objectives of goal reaching and collision avoidance, which ensure both herding completion and safety guarantees. Then, we reformulate the underactuated herding dynamics into a control-affine structure and employ a backstepping approach to recursively design control inputs for the hierarchical barrier functions, avoiding taking derivatives of the higher-order system. Finally, we present a cooperative herding strategy based on backstepping CBFs that allow herders to safely herd multiple evaders into the goal region. In addition, centralized and decentralized implementations of the proposed algorithm are developed, further enhancing its flexibility and applicability. Extensive simulations and real-world experiments validate the effectiveness and safety of the proposed strategy in multi-robot herding.