hopper
A Hyperparameter Settings of RD
In this section, we describe details about hyperparameter setting of RD. SAC-N-Unc and TD3-N-Unc, M is set to 1/10 of the total training steps. To ensure fairness, algorithms employing RD are implemented using CORL repository [54]. By modifying the original SAC/TD3 algorithm to employ a critic ensemble of number N and incorporate an uncertainty regularization term within the policy update process, we derive these backbone algorithms. Additionally, using RD with fewer Q ensembles can achieve similar or even better results than the backbone methods using more Q ensembles, indicating its potential in reducing computing resource consumption.
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > Canada (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
SupplementaryMaterialfor BAIL: Best-ActionImitationLearningfor BatchDeepReinforcementLearning
Note that ˆφ is feasible for the constrained optimization problem. We refer to it as an "early stopping scheme" because the key idea is to return to the parameter values which gave the lowest validation error (see Section 7.8 of Goodfellow et al.[3]). In our implementation, we initialize two upper envelope networks with parametersφ and φ0, where φ is trained using the penalty loss, andφ0 records the parameters with the lowest validation error encounteredsofar. IfLφ > Lφ0, we count the number of consecutive times this occurs. Notonlyis this not standard practice, but to makeafair comparison across all algorithms, this would require, foreachofthe fivealgorithms, performing aseparate hyper-parameter search foreachofthe five environments.
- North America > United States (0.05)
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
SAD-Flower: Flow Matching for Safe, Admissible, and Dynamically Consistent Planning
Huang, Tzu-Yuan, Lederer, Armin, Wu, Dai-Jie, Dai, Xiaobing, Zhang, Sihua, Sosnowski, Stefan, Sun, Shao-Hua, Hirche, Sandra
Flow matching (FM) has shown promising results in data-driven planning. However, it inherently lacks formal guarantees for ensuring state and action constraints, whose satisfaction is a fundamental and crucial requirement for the safety and admissibility of planned trajectories on various systems. Moreover, existing FM planners do not ensure the dynamical consistency, which potentially renders trajectories inexecutable. We address these shortcomings by proposing SAD-Flower, a novel framework for generating Safe, Admissible, and Dynamically consistent trajectories. Our approach relies on an augmentation of the flow with a virtual control input. Thereby, principled guidance can be derived using techniques from nonlinear control theory, providing formal guarantees for state constraints, action constraints, and dynamic consistency. Crucially, SAD-Flower operates without retraining, enabling test-time satisfaction of unseen constraints. Through extensive experiments across several tasks, we demonstrate that SAD-Flower outperforms various generative-model-based baselines in ensuring constraint satisfaction.
- Europe > United Kingdom > North Sea > Southern North Sea (0.04)
- North America > United States > Utah (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Robots (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.92)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)