RUDDER: Return Decomposition for Delayed Rewards

anonymous

Neural Information Processing Systems 

reinforcement learning; delayed reward; reward redistribution; return decomposition; bias-variance; credit assignment; LSTM

Similar Docs  Excel Report  more

TitleSimilaritySource
None found