AITopics | half-cheetah

Appendix A Implementation Details

Neural Information Processing SystemsFeb-16-2026, 20:31:39 GMT

A.1 More Information About The Continuous Environment We provide a detailed description of the continuous environments with constrained settings: Let's consider an optimization problem in the form of: minimize α After analyzing Table C.1 and Figure C.1, it is evident that the B2CL, MEICRL, and InfoGAIL-ICRL Although MMICRL-LD shows a notable improvement, its performance remains mediocre in environments involving three types of agents. Table C.2 presents the mean std results of all algorithms in Mujoco. Figure C.2 depicts the distribution of x-coordinate values Half-Cheetah, Blocked Swimmer, and Blocked Walker environments. It demonstrates the algorithm's capacity to infer and restore incorrect We employ "/" to separate the results for various We present the mean std results calculated over 20 runs for each random seed.Method Setting 1 Setting 2 Setting 3 Setting 4 Feasible Cumulative Rewards B2CL 0.24 0 .40 Figure C.1: The feasible cumulative rewards (left two columns of the first three rows and second-to-last row) and constraint violation rate (right two columns of the first three rows and last row). The first row showcases the expert demonstration, followed by the results of B2CL, MEICRL, InfoGAIL-ICRL, MMICRL-LD, and MMICRL algorithms.

agent type, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

cf4356f994917177213c55ff438ddf71-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-12-2026, 00:21:29 GMT

change factor, experiment, half-cheetah, (15 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ConservativeDualPolicyOptimizationforEfficient Model-Based ReinforcementLearning

Neural Information Processing SystemsFeb-11-2026, 02:52:59 GMT

Based ontheprinciple ofoptimism inthefaceofuncertainty(OFU) [56,49,10],OFU-RL achievestheglobal optimality by ensuring that the optimistically biased value is close to the real value in the long run. Based on Thompson Sampling [62], Posterior Sampling RL (PSRL) [57, 42, 43] explores by greedily optimizing the policy in an MDP which is sampled from the posterior distribution over MDPs.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Washington > King County > Seattle (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Add feedback

0912d0f15f1394268c66639e39b26215-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-7-2026, 09:54:52 GMT

exp, guidance reward, ircr, (16 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.41)
Information Technology > Artificial Intelligence > Robots (0.31)

Add feedback

Appendix A Implementation Details

Neural Information Processing SystemsOct-9-2025, 06:13:50 GMT

A.1 More Information About The Continuous Environment We provide a detailed description of the continuous environments with constrained settings: Let's consider an optimization problem in the form of: minimize α After analyzing Table C.1 and Figure C.1, it is evident that the B2CL, MEICRL, and InfoGAIL-ICRL Although MMICRL-LD shows a notable improvement, its performance remains mediocre in environments involving three types of agents. Table C.2 presents the mean std results of all algorithms in Mujoco. Figure C.2 depicts the distribution of x-coordinate values Half-Cheetah, Blocked Swimmer, and Blocked Walker environments. It demonstrates the algorithm's capacity to infer and restore incorrect We employ "/" to separate the results for various We present the mean std results calculated over 20 runs for each random seed.Method Setting 1 Setting 2 Setting 3 Setting 4 Feasible Cumulative Rewards B2CL 0.24 0 .40 Figure C.1: The feasible cumulative rewards (left two columns of the first three rows and second-to-last row) and constraint violation rate (right two columns of the first three rows and last row). The first row showcases the expert demonstration, followed by the results of B2CL, MEICRL, InfoGAIL-ICRL, MMICRL-LD, and MMICRL algorithms.

agent type, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

cf4356f994917177213c55ff438ddf71-Supplemental-Conference.pdf

Neural Information Processing SystemsAug-19-2025, 01:27:08 GMT

artificial intelligence, change factor, machine learning, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Efficient Exploration in Resource-Restricted Reinforcement Learning

Wang, Zhihai, Pan, Taoxing, Zhou, Qi, Wang, Jie

arXiv.org Artificial IntelligenceDec-13-2022

In many real-world applications of reinforcement learning (RL), performing actions requires consuming certain types of resources that are non-replenishable in each episode. Typical applications include robotic control with limited energy and video games with consumable items. In tasks with non-replenishable resources, we observe that popular RL methods such as soft actor critic suffer from poor sample efficiency. The major reason is that, they tend to exhaust resources fast and thus the subsequent exploration is severely restricted due to the absence of resources. To address this challenge, we first formalize the aforementioned problem as a resource-restricted reinforcement learning, and then propose a novel resource-aware exploration bonus (RAEB) to make reasonable usage of resources. An appealing feature of RAEB is that, it can significantly reduce unnecessary resource-consuming trials while effectively encouraging the agent to explore unvisited states. Experiments demonstrate that the proposed RAEB significantly outperforms state-of-the-art exploration strategies in resource-restricted reinforcement learning environments, improving the sample efficiency by up to an order of magnitude.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2212.06988

Country: North America > United States (0.67)

Genre: Research Report (0.65)

Industry:

Leisure & Entertainment > Games (0.34)
Energy > Oil & Gas > Upstream (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Factored Adaptation for Non-Stationary Reinforcement Learning

Feng, Fan, Huang, Biwei, Zhang, Kun, Magliacane, Sara

arXiv.org Artificial IntelligenceOct-17-2022

Dealing with non-stationarity in environments (e.g., in the transition dynamics) and objectives (e.g., in the reward functions) is a challenging problem that is crucial in real-world applications of reinforcement learning (RL). While most current approaches model the changes as a single shared embedding vector, we leverage insights from the recent causality literature to model non-stationarity in terms of individual latent change factors, and causal graphs across different environments. In particular, we propose Factored Adaptation for Non-Stationary RL (FANS-RL), a factored adaption approach that learns jointly both the causal structure in terms of a factored MDP, and a factored representation of the individual time-varying change factors. We prove that under standard assumptions, we can completely recover the causal graph representing the factored transition and reward function, as well as a partial structure between the individual change factors and the state components. Through our general framework, we can consider general non-stationary scenarios with different function types and changing frequency, including changes across episodes and within episodes. Experimental results demonstrate that FANS-RL outperforms existing approaches in terms of return, compactness of the latent state representation, and robustness to varying degrees of non-stationarity.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2203.16582

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > China > Hong Kong (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(3 more...)

Genre: Research Report > New Finding (0.87)

Industry:

Information Technology (0.67)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Heteroscedastic Bayesian Optimisation for Stochastic Model Predictive Control

Guzman, Rel, Oliveira, Rafael, Ramos, Fabio

arXiv.org Machine LearningOct-7-2020

Model predictive control (MPC) has been successful in applications involving the control of complex physical systems. This class of controllers leverages the information provided by an approximate model of the system's dynamics to simulate the effect of control actions. MPC methods also present a few hyper-parameters which may require a relatively expensive tuning process by demanding interactions with the physical system. Therefore, we investigate fine-tuning MPC methods in the context of stochastic MPC, which presents extra challenges due to the randomness of the controller's actions. In these scenarios, performance outcomes present noise, which is not homogeneous across the domain of possible hyper-parameter settings, but which varies in an input-dependent way. To address these issues, we propose a Bayesian optimisation framework that accounts for heteroscedastic noise to tune hyper-parameters in control problems. Empirical results on benchmark continuous control tasks and a physical robot support the proposed framework's suitability relative to baselines, which do not take heteroscedasticity into account.

noise model, optimization problem, upstream oil & gas, (16 more...)

arXiv.org Machine Learning

2010.00202

Country:

North America > United States (0.28)
Oceania > Australia (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.87)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Filters

Collaborating Authors

half-cheetah

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Appendix A Implementation Details

cf4356f994917177213c55ff438ddf71-Supplemental-Conference.pdf

ConservativeDualPolicyOptimizationforEfficient Model-Based ReinforcementLearning

0912d0f15f1394268c66639e39b26215-AuthorFeedback.pdf

Appendix A Implementation Details

cf4356f994917177213c55ff438ddf71-Supplemental-Conference.pdf

Efficient Exploration in Resource-Restricted Reinforcement Learning

Factored Adaptation for Non-Stationary Reinforcement Learning

Heteroscedastic Bayesian Optimisation for Stochastic Model Predictive Control