half-cheetah
Appendix A Implementation Details
A.1 More Information About The Continuous Environment We provide a detailed description of the continuous environments with constrained settings: Let's consider an optimization problem in the form of: minimize α After analyzing Table C.1 and Figure C.1, it is evident that the B2CL, MEICRL, and InfoGAIL-ICRL Although MMICRL-LD shows a notable improvement, its performance remains mediocre in environments involving three types of agents. Table C.2 presents the mean std results of all algorithms in Mujoco. Figure C.2 depicts the distribution of x-coordinate values Half-Cheetah, Blocked Swimmer, and Blocked Walker environments. It demonstrates the algorithm's capacity to infer and restore incorrect We employ "/" to separate the results for various We present the mean std results calculated over 20 runs for each random seed.Method Setting 1 Setting 2 Setting 3 Setting 4 Feasible Cumulative Rewards B2CL 0.24 0 .40 Figure C.1: The feasible cumulative rewards (left two columns of the first three rows and second-to-last row) and constraint violation rate (right two columns of the first three rows and last row). The first row showcases the expert demonstration, followed by the results of B2CL, MEICRL, InfoGAIL-ICRL, MMICRL-LD, and MMICRL algorithms.
ConservativeDualPolicyOptimizationforEfficient Model-Based ReinforcementLearning
Based ontheprinciple ofoptimism inthefaceofuncertainty(OFU) [56,49,10],OFU-RL achievestheglobal optimality by ensuring that the optimistically biased value is close to the real value in the long run. Based on Thompson Sampling [62], Posterior Sampling RL (PSRL) [57, 42, 43] explores by greedily optimizing the policy in an MDP which is sampled from the posterior distribution over MDPs.
Appendix A Implementation Details
A.1 More Information About The Continuous Environment We provide a detailed description of the continuous environments with constrained settings: Let's consider an optimization problem in the form of: minimize α After analyzing Table C.1 and Figure C.1, it is evident that the B2CL, MEICRL, and InfoGAIL-ICRL Although MMICRL-LD shows a notable improvement, its performance remains mediocre in environments involving three types of agents. Table C.2 presents the mean std results of all algorithms in Mujoco. Figure C.2 depicts the distribution of x-coordinate values Half-Cheetah, Blocked Swimmer, and Blocked Walker environments. It demonstrates the algorithm's capacity to infer and restore incorrect We employ "/" to separate the results for various We present the mean std results calculated over 20 runs for each random seed.Method Setting 1 Setting 2 Setting 3 Setting 4 Feasible Cumulative Rewards B2CL 0.24 0 .40 Figure C.1: The feasible cumulative rewards (left two columns of the first three rows and second-to-last row) and constraint violation rate (right two columns of the first three rows and last row). The first row showcases the expert demonstration, followed by the results of B2CL, MEICRL, InfoGAIL-ICRL, MMICRL-LD, and MMICRL algorithms.
Efficient Exploration in Resource-Restricted Reinforcement Learning
Wang, Zhihai, Pan, Taoxing, Zhou, Qi, Wang, Jie
In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types of resources that are non-replenishable in each episode. Typical applications include robotic control with limited energy and video games with consumable items. In tasks with non-replenishable resources, we observe that popular RL methods such as soft actor critic suffer from poor sample efficiency. The major reason is that, they tend to exhaust resources fast and thus the subsequent exploration is severely restricted due to the absence of resources. To address this challenge, we first formalize the aforementioned problem as a resource-restricted reinforcement learning, and then propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources. An appealing feature of RAEB is that, it can significantly reduce unnecessary resource-consuming trials while effectively encouraging the agent to explore unvisited states. Experiments demonstrate that the proposed RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments, improving the sample efficiency by up to an order of magnitude.
- Leisure & Entertainment > Games (0.34)
- Energy > Oil & Gas > Upstream (0.34)
Factored Adaptation for Non-Stationary Reinforcement Learning
Feng, Fan, Huang, Biwei, Zhang, Kun, Magliacane, Sara
Dealing with non-stationarity in environments (e.g., in the transition dynamics) and objectives (e.g., in the reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). While most current approaches model the changes as a single shared embedding vector, we leverage insights from the recent causality literature to model non-stationarity in terms of individual latent change factors, and causal graphs across different environments. In particular, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that learns jointly both the causal structure in terms of a factored MDP, and a factored representation of the individual time-varying change factors. We prove that under standard assumptions, we can completely recover the causal graph representing the factored transition and reward function, as well as a partial structure between the individual change factors and the state components. Through our general framework, we can consider general non-stationary scenarios with different function types and changing frequency, including changes across episodes and within episodes. Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of return, compactness of the latent state representation, and robustness to varying degrees of non-stationarity.
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- Asia > China > Hong Kong (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- (3 more...)
- Information Technology (0.67)
- Health & Medicine (0.46)
Heteroscedastic Bayesian Optimisation for Stochastic Model Predictive Control
Guzman, Rel, Oliveira, Rafael, Ramos, Fabio
Model predictive control (MPC) has been successful in applications involving the control of complex physical systems. This class of controllers leverages the information provided by an approximate model of the system's dynamics to simulate the effect of control actions. MPC methods also present a few hyper-parameters which may require a relatively expensive tuning process by demanding interactions with the physical system. Therefore, we investigate fine-tuning MPC methods in the context of stochastic MPC, which presents extra challenges due to the randomness of the controller's actions. In these scenarios, performance outcomes present noise, which is not homogeneous across the domain of possible hyper-parameter settings, but which varies in an input-dependent way. To address these issues, we propose a Bayesian optimisation framework that accounts for heteroscedastic noise to tune hyper-parameters in control problems. Empirical results on benchmark continuous control tasks and a physical robot support the proposed framework's suitability relative to baselines, which do not take heteroscedasticity into account.
- North America > United States (0.28)
- Oceania > Australia (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)