Reviews: RUDDER: Return Decomposition for Delayed Rewards

Jan-21-2025, 22:34:08 GMT–Neural Information Processing Systems

The reward redistribution method is proven to preserve optimal policies and reduce the expected future reward to zero. This is achieved by redistributing the delayed rewards to the salient state-action events (where saliency is determined by contribution analysis methods). Extensive experiments in both toy domains, as well as the suite of Atari games, demonstrate the method's improvements for delayed reward tasks, as well as the shortcomings of MC and TD methods for these types of tasks. Comments: I felt the work presented in the paper is outstanding. There are numerous contributions that could conceivably stand on their own (resulting in an extremely large appendix!).

experiment, return decomposition, rudder, (8 more...)

Neural Information Processing Systems

Jan-21-2025, 22:34:08 GMT

Conferences Web Page

Add feedback

Industry:
- Leisure & Entertainment > Games > Computer Games (0.59)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)