Reinforcement Learning -- Generalisation on Continuing Tasks
Till now we have been through many reinforcement learning examples, from on-policy to off-policy, discrete state space to continuous state space. All these examples vary in some way, but you might have noticed that they have at least one shared trait -- Episodic, that is all of which have a clear starting point and ending point, and whenever an agent reaches the goal, it starts over again and again until reaching certain number of loops. In this article, we will extend the idea to non-episodic task, that is task which has no clear ending point and the agent goes on forever in the environment setting. The main concept that will be applied to non-episodic task is average reward. The average reward setting also applies to continuing problems, problems for which the interaction between agent and environment goes on and on forever without termination or start states.
Oct-24-2019, 20:14:32 GMT
- Technology: