AITopics | autombpo

Appendix for On Effective Scheduling of Model based Reinforcement Learning

Neural Information Processing SystemsApr-25-2026, 00:31:26 GMT

We call c(m) the m-step concentrability of a future-state distribution and call Cρ,µ the discountedaverage concentrability coefficient of the future-state distributions. The class of MDPs that satisfies this concentrability assumption is quite large, which is further discussed in Munos and Szepesvári [18]. If Xi, i = 1,...,N is an i.i.d. And when q = 1, N is used instead of N1. From the definition, one can esasily see that Nq,FX1:N N. Lemma A.2. (Single Iteration Error Bound) Let Vk and Vk+1 be the value functions of iteration kand k+1, and Vmax = rmax/(1 γ).

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

1e4d36177d71bbb3558e43af9577d70e-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 00:31:23 GMT

artificial intelligence, machine learning, reinforcement learning, (13 more...)

Neural Information Processing Systems

Country: Asia (0.28)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Robots (0.94)

Add feedback

1e4d36177d71bbb3558e43af9577d70e-Supplemental.pdf

Neural Information Processing SystemsFeb-7-2026, 18:25:36 GMT

autombpo, hyperparameter, policy training iteration, (13 more...)

Neural Information Processing Systems

Genre: Research Report > Experimental Study (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

OnEffectiveSchedulingofModel-based ReinforcementLearning

Neural Information Processing SystemsFeb-7-2026, 18:25:32 GMT

Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

Neural Information Processing Systems

Country: Asia > China > Shanghai > Shanghai (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

On Effective Scheduling of Model-based Reinforcement Learning

Neural Information Processing SystemsDec-23-2025, 20:28:47 GMT

Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance. Inspired by the analysis, we propose a framework named AutoMBPO to automatically schedule the real data ratio as well as other hyperparameters in training model-based policy optimization (MBPO) algorithm, a representative running case of model-based methods. On several continuous control tasks, the MBPO instance trained with hyperparameters scheduled by AutoMBPO can significantly surpass the original one, and the real data ratio schedule found by AutoMBPO shows consistency with our theoretical analysis.

effective scheduling, model-based reinforcement learning, name change, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.31)

Add feedback

On Effective Scheduling of Model-based Reinforcement Learning

Neural Information Processing SystemsOct-9-2024, 17:21:56 GMT

Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance. Inspired by the analysis, we propose a framework named AutoMBPO to automatically schedule the real data ratio as well as other hyperparameters in training model-based policy optimization (MBPO) algorithm, a representative running case of model-based methods. On several continuous control tasks, the MBPO instance trained with hyperparameters scheduled by AutoMBPO can significantly surpass the original one, and the real data ratio schedule found by AutoMBPO shows consistency with our theoretical analysis.

effective scheduling, hyperparameter, model-based reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

On Effective Scheduling of Model-based Reinforcement Learning

Lai, Hang, Shen, Jian, Zhang, Weinan, Huang, Yimin, Zhang, Xing, Tang, Ruiming, Yu, Yong, Li, Zhenguo

arXiv.org Artificial IntelligenceNov-16-2021

Model-based reinforcement learning has attracted wide attention due to its superior sample efficiency. Despite its impressive success so far, it is still unclear how to appropriately schedule the important hyperparameters to achieve adequate performance, such as the real data ratio for policy optimization in Dyna-style model-based algorithms. In this paper, we first theoretically analyze the role of real data in policy training, which suggests that gradually increasing the ratio of real data yields better performance. Inspired by the analysis, we propose a framework named AutoMBPO to automatically schedule the real data ratio as well as other hyperparameters in training model-based policy optimization (MBPO) algorithm, a representative running case of model-based methods. On several continuous control tasks, the MBPO instance trained with hyperparameters scheduled by AutoMBPO can significantly surpass the original one, and the real data ratio schedule found by AutoMBPO shows consistency with our theoretical analysis.

autombpo, average return average return, hyperparameter, (12 more...)

arXiv.org Artificial Intelligence

2111.0855

Country: