gcrl
- North America > United States > Pennsylvania (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
Full-Order Sampling-Based MPC for Torque-Level Locomotion Control via Diffusion-Style Annealing
Xue, Haoru, Pan, Chaoyi, Yi, Zeji, Qu, Guannan, Shi, Guanya
Due to high dimensionality and non-convexity, real-time optimal control using full-order dynamics models for legged robots is challenging. Therefore, Nonlinear Model Predictive Control (NMPC) approaches are often limited to reduced-order models. Sampling-based MPC has shown potential in nonconvex even discontinuous problems, but often yields suboptimal solutions with high variance, which limits its applications in high-dimensional locomotion. This work introduces DIAL-MPC (Diffusion-Inspired Annealing for Legged MPC), a sampling-based MPC framework with a novel diffusion-style annealing process. Such an annealing process is supported by the theoretical landscape analysis of Model Predictive Path Integral Control (MPPI) and the connection between MPPI and single-step diffusion. Algorithmically, DIAL-MPC iteratively refines solutions online and achieves both global coverage and local convergence. In quadrupedal torque-level control tasks, DIAL-MPC reduces the tracking error of standard MPPI by $13.4$ times and outperforms reinforcement learning (RL) policies by $50\%$ in challenging climbing tasks without any training. In particular, DIAL-MPC enables precise real-world quadrupedal jumping with payload. To the best of our knowledge, DIAL-MPC is the first training-free method that optimizes over full-order quadruped dynamics in real-time.
- North America > United States > Pennsylvania (0.14)
- Asia > Japan (0.14)
- Asia > China (0.14)
Quasimetric Value Functions with Dense Rewards
Valieva, Khadichabonu, Banerjee, Bikramjit
As a generalization of reinforcement learning (RL) to parametrizable goals, goal conditioned RL (GCRL) has a broad range of applications, particularly in challenging tasks in robotics. Recent work has established that the optimal value function of GCRL $Q^\ast(s,a,g)$ has a quasimetric structure, leading to targetted neural architectures that respect such structure. However, the relevant analyses assume a sparse reward setting -- a known aggravating factor to sample complexity. We show that the key property underpinning a quasimetric, viz., the triangle inequality, is preserved under a dense reward setting as well. Contrary to earlier findings where dense rewards were shown to be detrimental to GCRL, we identify the key condition necessary for triangle inequality. Dense reward functions that satisfy this condition can only improve, never worsen, sample complexity. This opens up opportunities to train efficient neural architectures with dense rewards, compounding their benefits to sample complexity. We evaluate this proposal in 12 standard benchmark environments in GCRL featuring challenging continuous control tasks. Our empirical results confirm that training a quasimetric value function in our dense reward setting indeed outperforms training with sparse rewards.
- North America > United States > Mississippi (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
Offline Goal-Conditioned Reinforcement Learning via f-Advantage Regression
Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill learning in the form of reaching diverse goals from purely offline datasets. We propose Goal-conditioned f-Advantage Regression (GoFAR), a novel regressionbased offline GCRL algorithm derived from a state-occupancy matching perspective; the key intuition is that the goal-reaching task can be formulated as a stateoccupancy matching problem between a dynamics-abiding imitator agent and an expert agent that directly teleports to the goal. In contrast to prior approaches, GoFAR does not require any hindsight relabeling and enjoys uninterleaved optimization for its value and policy networks. These distinct features confer GoFAR with much better offline performance and stability as well as statistical performance guarantee that is unattainable for prior methods. Furthermore, we demonstrate that GoFAR's training objectives can be re-purposed to learn an agent-independent goal-conditioned planner from purely offline source-domain data, which enables zero-shot transfer to new target domains.
- North America > United States > Pennsylvania (0.04)
- North America > United States > New Jersey > Mercer County > Princeton (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- (2 more...)
Stitching Sub-Trajectories with Conditional Diffusion Model for Goal-Conditioned Offline RL
Kim, Sungyoon, Choi, Yunseon, Matsunaga, Daiki E., Kim, Kee-Eung
Offline Goal-Conditioned Reinforcement Learning (Offline GCRL) is an important problem in RL that focuses on acquiring diverse goal-oriented skills solely from pre-collected behavior datasets. In this setting, the reward feedback is typically absent except when the goal is achieved, which makes it difficult to learn policies especially from a finite dataset of suboptimal behaviors. In addition, realistic scenarios involve long-horizon planning, which necessitates the extraction of useful skills within sub-trajectories. Recently, the conditional diffusion model has been shown to be a promising approach to generate high-quality long-horizon plans for RL. However, their practicality for the goal-conditioned setting is still limited due to a number of technical assumptions made by the methods. In this paper, we propose SSD (Sub-trajectory Stitching with Diffusion), a model-based offline GCRL method that leverages the conditional diffusion model to address these limitations. In summary, we use the diffusion model that generates future plans conditioned on the target goal and value, with the target value estimated from the goal-relabeled offline dataset. We report state-of-the-art performance in the standard benchmark set of GCRL tasks, and demonstrate the capability to successfully stitch the segments of suboptimal trajectories in the offline data to generate high-quality plans.
- Europe > France > Hauts-de-France > Nord > Lille (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > Puerto Rico > San Juan > San Juan (0.04)
ReRoGCRL: Representation-based Robustness in Goal-Conditioned Reinforcement Learning
Yin, Xiangyu, Wu, Sihao, Liu, Jiaxu, Fang, Meng, Zhao, Xingyu, Huang, Xiaowei, Ruan, Wenjie
While Goal-Conditioned Reinforcement Learning (GCRL) has gained attention, its algorithmic robustness against adversarial perturbations remains unexplored. The attacks and robust representation training methods that are designed for traditional RL become less effective when applied to GCRL. To address this challenge, we first propose the Semi-Contrastive Representation attack, a novel approach inspired by the adversarial contrastive attack. Unlike existing attacks in RL, it only necessitates information from the policy function and can be seamlessly implemented during deployment. Then, to mitigate the vulnerability of existing GCRL algorithms, we introduce Adversarial Representation Tactics, which combines Semi-Contrastive Adversarial Augmentation with Sensitivity-Aware Regularizer to improve the adversarial robustness of the underlying RL agent against various types of perturbations. Extensive experiments validate the superior performance of our attack and defence methods across multiple state-of-the-art GCRL algorithms. Our tool ReRoGCRL is available at https://github.com/TrustAI/ReRoGCRL.
- North America > United States > Montana (0.04)
- Europe > United Kingdom (0.04)
- Asia > China (0.04)
Score Models for Offline Goal-Conditioned Reinforcement Learning
Sikchi, Harshit, Chitnis, Rohan, Touati, Ahmed, Geramifard, Alborz, Zhang, Amy, Niekum, Scott
Offline Goal-Conditioned Reinforcement Learning (GCRL) is tasked with learning to achieve multiple goals in an environment purely from offline datasets using sparse reward functions. Offline GCRL is pivotal for developing generalist agents capable of leveraging pre-existing datasets to learn diverse and reusable skills without hand-engineering reward functions. However, contemporary approaches to GCRL based on supervised learning and contrastive learning are often suboptimal in the offline setting. An alternative perspective on GCRL optimizes for occupancy matching, but necessitates learning a discriminator, which subsequently serves as a pseudo-reward for downstream RL. Inaccuracies in the learned discriminator can cascade, negatively influencing the resulting policy. We present a novel approach to GCRL under a new lens of mixture-distribution matching, leading to our discriminator-free method: SMORe. The key insight is combining the occupancy matching perspective of GCRL with a convex dual formulation to derive a learning objective that can better leverage suboptimal offline data. SMORe learns scores or unnormalized densities representing the importance of taking an action at a state for reaching a particular goal. SMORe is principled and our extensive experiments on the fully offline GCRL benchmark composed of robot manipulation and locomotion tasks, including high-dimensional observations, show that SMORe can outperform state-of-the-art baselines by a significant margin. Many subfields of machine learning such as vision and NLP have enjoyed great success by designing objectives to learn a general model from large and diverse datasets. In robot learning, offline interaction data has become more prominent in the recent past (Ebert et al., 2021), with the scale of the datasets growing consistently (Walke et al., 2023; Padalkar et al., 2023).
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California (0.04)
Goal-Conditioned Reinforcement Learning with Disentanglement-based Reachability Planning
Qian, Zhifeng, You, Mingyu, Zhou, Hongjun, Xu, Xuanhui, He, Bin
Goal-Conditioned Reinforcement Learning (GCRL) can enable agents to spontaneously set diverse goals to learn a set of skills. Despite the excellent works proposed in various fields, reaching distant goals in temporally extended tasks remains a challenge for GCRL. Current works tackled this problem by leveraging planning algorithms to plan intermediate subgoals to augment GCRL. Their methods need two crucial requirements: (i) a state representation space to search valid subgoals, and (ii) a distance function to measure the reachability of subgoals. However, they struggle to scale to high-dimensional state space due to their non-compact representations. Moreover, they cannot collect high-quality training data through standard GC policies, which results in an inaccurate distance function. Both affect the efficiency and performance of planning and policy learning. In the paper, we propose a goal-conditioned RL algorithm combined with Disentanglement-based Reachability Planning (REPlan) to solve temporally extended tasks. In REPlan, a Disentangled Representation Module (DRM) is proposed to learn compact representations which disentangle robot poses and object positions from high-dimensional observations in a self-supervised manner. A simple REachability discrimination Module (REM) is also designed to determine the temporal distance of subgoals. Moreover, REM computes intrinsic bonuses to encourage the collection of novel states for training. We evaluate our REPlan in three vision-based simulation tasks and one real-world task. The experiments demonstrate that our REPlan significantly outperforms the prior state-of-the-art methods in solving temporally extended tasks.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Generative Causal Representation Learning for Out-of-Distribution Motion Forecasting
Bagi, Shayan Shirahmad Gale, Gharaee, Zahra, Schulte, Oliver, Crowley, Mark
Conventional supervised learning methods typically assume i.i.d samples and are found to be sensitive to out-of-distribution (OOD) data. We propose Generative Causal Representation Learning (GCRL) which leverages causality to facilitate knowledge transfer under distribution shifts. While we evaluate the effectiveness of our proposed method in human trajectory prediction models, GCRL can be applied to other domains as well. First, we propose a novel causal model that explains the generative factors in motion forecasting datasets using features that are common across all environments and with features that are specific to each environment. Selection variables are used to determine which parts of the model can be directly transferred to a new environment without fine-tuning. Second, we propose an end-to-end variational learning paradigm to learn the causal mechanisms that generate observations from features. GCRL is supported by strong theoretical results that imply identifiability of the causal model under certain assumptions. Experimental results on synthetic and real-world motion forecasting datasets show the robustness and effectiveness of our proposed method for knowledge transfer under zero-shot and low-shot settings by substantially outperforming the prior motion forecasting models on out-of-distribution prediction. Our code is available at https://github.com/sshirahmad/GCRL.
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (5 more...)
Metric Residual Networks for Sample Efficient Goal-Conditioned Reinforcement Learning
Liu, Bo, Feng, Yihao, Liu, Qiang, Stone, Peter
Goal-conditioned reinforcement learning (GCRL) has a wide range of potential real-world applications, including manipulation and navigation problems in robotics. Especially in such robotics tasks, sample efficiency is of the utmost importance for GCRL since, by default, the agent is only rewarded when it reaches its goal. While several methods have been proposed to improve the sample efficiency of GCRL, one relatively under-studied approach is the design of neural architectures to support sample efficiency. In this work, we introduce a novel neural architecture for GCRL that achieves significantly better sample efficiency than the commonly-used monolithic network architecture. The key insight is that the optimal action-value function Q^*(s, a, g) must satisfy the triangle inequality in a specific sense. Furthermore, we introduce the metric residual network (MRN) that deliberately decomposes the action-value function Q(s,a,g) into the negated summation of a metric plus a residual asymmetric component. MRN provably approximates any optimal action-value function Q^*(s,a,g), thus making it a fitting neural architecture for GCRL. We conduct comprehensive experiments across 12 standard benchmark environments in GCRL. The empirical results demonstrate that MRN uniformly outperforms other state-of-the-art GCRL neural architectures in terms of sample efficiency.