Reviews: A Family of Robust Stochastic Operators for Reinforcement Learning

Jan-26-2025, 05:22:13 GMT–Neural Information Processing Systems

SUMMARY: The paper considers the problem of designing a Bellman-like operator with certain properties: 1) Optimality preserving property: The greedy policy of the converged action-value function be the optimal policy. The motivation for the action-gap increasing property comes from the result of Farahmand [12] that shows that the distribution of the action-gap is a factor in the convergence to the optimal policy. Roughly speaking, when the action-gap is large, errors in estimating the action-value function Q becomes less important. The result is that we might converge to the optimal policy even though the estimated action-value function is far from the optimal one. Bellemare et al. [5] propose some operators that have these properties.

action-value function, bellman operator, operator, (11 more...)

Neural Information Processing Systems

Jan-26-2025, 05:22:13 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.50)