RUDDER: Return Decomposition for Delayed Rewards
–Neural Information Processing Systems
reinforcement learning; delayed reward; reward redistribution; return decomposition; bias-variance; credit assignment; LSTM
Neural Information Processing Systems
Nov-15-2025, 21:56:37 GMT