Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion

Jan-19-2025, 19:32:52 GMT–Neural Information Processing Systems

Distributional reinforcement learning algorithms have attempted to utilize estimated uncertainty for exploration, such as optimism in the face of uncertainty. However, using the estimated variance for optimistic exploration may cause biased data collection and hinder convergence or performance. In this paper, we present a novel distributional reinforcement learning that selects actions by randomizing risk criterion without losing the risk-neutral objective. We provide a perturbed distributional Bellman optimality operator by distorting the risk measure. Also,we prove the convergence and optimality of the proposed method with the weaker contraction property.

distributional reinforcement learning, optimism, randomizing risk criterion, (3 more...)

Neural Information Processing Systems

Jan-19-2025, 19:32:52 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)