ConservativeDualPolicyOptimizationforEfficient Model-Based ReinforcementLearning

Feb-11-2026, 02:52:59 GMT–Neural Information Processing Systems

Based ontheprinciple ofoptimism inthefaceofuncertainty(OFU) [56,49,10],OFU-RL achievestheglobal optimality by ensuring that the optimistically biased value is close to the real value in the long run. Based on Thompson Sampling [62], Posterior Sampling RL (PSRL) [57, 42, 43] explores by greedily optimizing the policy in an MDP which is sampled from the posterior distribution over MDPs.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

Neural Information Processing Systems

Feb-11-2026, 02:52:59 GMT

Conferences PDF

Add feedback

Country:
- North America > United States > Washington > King County > Seattle (0.04)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.84)

Duplicate Docs Excel Report

Title
Conservative Dual Policy Optimization for Efficient Model-Based Reinforcement Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found