Evolution-Guided Policy Gradient in Reinforcement Learning
–Neural Information Processing Systems
Temporal Difference methods inRL use bootstrapping to address this issue but often struggle when the time horizons are long and the reward is sparse.
Neural Information Processing Systems
Feb-13-2026, 13:37:30 GMT
- Technology: