RUDDER: Return Decomposition for Delayed Rewards

Jose A. Arjona-Medina, Michael Gillhofer, Michael Widrich, Thomas Unterthiner, Johannes Brandstetter, Sepp Hochreiter

Neural Information Processing Systems 

We propose RUDDER, a novel reinforcement learning approach for delayed rewards in finite Markov decision processes (MDPs).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found