SupplementaryMaterialforRethinkingValue FunctionLearningforGeneralizationin ReinforcementLearning

Neural Information Processing Systems 

Then,wecalculatethe mean stiffness of the value network across all state pairs and report its average computed over all trainingepochs. Eachagentis trained on 200 training levels for 25M environment steps. The mean and standard deviation are computedover10differentruns. Morespecifically,wecollect100 training episodes throughout the training and evaluate the value network prediction for the initial stateofeachtrajectory. Each agent is trained on 200 training levels for 25M environment steps.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found