Reinforcement Learning -- Generalisation of Off-Policy Learning

#artificialintelligence 

Till now, we have extended our reinforcement learning topic from discrete state to continuous state and have elaborated a bit on applying tile coding to on-policy learning, that is the learning process follows the trajectory the agent takes. Now let's have a talk of off-policy learning in continuous settings. While in discrete settings, on-policy learning can easily be generalised to off-policy learning(say, from Sarsa to Q-learning), in continuous settings, the generalisation can be a little troublesome, and in some scenarios can cause divergence issues. The most prominent consequence of off-policy learning is it may not necessarily converge in continuous settings. The major reason is caused by the distribution of updates in the off-policy case is not according to the on-policy distribution, that is the state, action being used to update might not be the state, action the agent takes.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found