Goto

Collaborating Authors

 Reinforcement Learning




OntheEstimationBiasinDoubleQ-Learning

Neural Information Processing Systems

One of the phenomena of interest is that Q-learning (Watkins, 1989) is known to suffer from overestimation issues, since it takes a maximum operator overaset ofestimated action-values.






SecurityAnalysisofSafeandSeldonian ReinforcementLearningAlgorithms

Neural Information Processing Systems

This component makes current Seldonian algorithms safe: the safety test checks whether necessary safety constraints are satisfiedwithhighprobability.