Reviews: Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion

Oct-8-2024, 09:11:16 GMT–Neural Information Processing Systems

The main algorithmic idea is a weighted combination of H step temporal differences, estimated on H steps (and rolled out by a learned model of the environment). The underlying idea is to allow the learner to tradeoff between estimation errors in model and Q function in different parts of the state-action space during learning. The updated TD estimator is incorporated into the DDPG algorithm in a straightforward manner. The update is computationally more intensive but the result is improved sample complexity. The experimental results on a variety of continuous control tasks show significant improvement over the baseline DDPG and a related method (MVE) (which is the precursor to this work). Overall, the paper is well written. The empirical results are very promising. The analysis and discussion is a bit limited but is not a major drawback. Overall, there is much to like about the paper.

algorithm, sample-efficient reinforcement learning, stochastic ensemble value expansion, (10 more...)

Neural Information Processing Systems

Oct-8-2024, 09:11:16 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)