Supplementary Material for Rethinking Value Function Learning for Generalization in Reinforcement Learning A Stiffness Analysis
–Neural Information Processing Systems
The green lines in Figure 1 demonstrate that the stiffness decreases as the number of training levels increases in most of the Procgen games. This suggests that the delayed critic update effectively alleviates the memorization problem. Each agent is trained on 200 training levels for 25M environment steps. Each agent is trained for 8M environment steps. The mean is computed over 10 different runs.
Neural Information Processing Systems
Aug-19-2025, 13:06:37 GMT
- Technology: