SupplementaryMaterialforRethinkingValue FunctionLearningforGeneralizationin ReinforcementLearning
–Neural Information Processing Systems
Then,wecalculatethe mean stiffness of the value network across all state pairs and report its average computed over all trainingepochs. Eachagentis trained on 200 training levels for 25M environment steps. The mean and standard deviation are computedover10differentruns. Morespecifically,wecollect100 training episodes throughout the training and evaluate the value network prediction for the initial stateofeachtrajectory. Each agent is trained on 200 training levels for 25M environment steps.
Neural Information Processing Systems
Feb-12-2026, 10:27:47 GMT
- Technology: