RUDDER: Return Decomposition for Delayed Rewards
–Neural Information Processing Systems
reinforcement learning; delayed reward; reward redistribution; return decomposition; bias-variance; credit assignment; LSTM
Neural Information Processing Systems
Oct-2-2025, 05:30:42 GMT