"Reinforcement learning is learning what to do – how to map situations to actions – so as to maximize a numerical reward signal. The learner is not told which actions to take, as in most forms of machine learning, but instead must discover which actions yield the most reward by trying them." – Sutton, Richard S. and Andrew G. Barto. Reinforcement Learning: An Introduction. (1.1). MIT Press, Cambridge, MA, 1998.
In reinforcement learning, robust policies for high-stakes decision-making problems with limited data are usually computed by optimizing the percentile criterion .
Wemodel theeventofinterest asaspecial terminal state, andwe seek to estimate the survival distribution (i.e., the distribution of the hitting time for that terminal state) from anyother state.