RUDDER: Return Decomposition for Delayed Rewards
–Neural Information Processing Systems
reinforcement learning; delayed reward; reward redistribution; return decomposition; bias-variance; credit assignment; LSTM
Neural Information Processing Systems
Feb-11-2026, 13:56:14 GMT