Goto

Collaborating Authors

 morespecifically






SupplementaryMaterialforRethinkingValue FunctionLearningforGeneralizationin ReinforcementLearning

Neural Information Processing Systems

Then,wecalculatethe mean stiffness of the value network across all state pairs and report its average computed over all trainingepochs. Eachagentis trained on 200 training levels for 25M environment steps. The mean and standard deviation are computedover10differentruns. Morespecifically,wecollect100 training episodes throughout the training and evaluate the value network prediction for the initial stateofeachtrajectory. Each agent is trained on 200 training levels for 25M environment steps.



CounterfactualTemporalPointProcesses

Neural Information Processing Systems

Machine learning models based on temporal point processes arethe state ofthe artinawide variety ofapplications involving discrete events incontinuous time.