Rollout Sampling Approximate Policy Iteration

Dimitrakakis, Christos, Lagoudakis, Michail G.

arXiv.org Artificial Intelligence 

Supervised and reinforcement learning are two well-known learning paradigms, which have been researched mostly independently. Recent studies have investigated the use of supervised learning methods for reinforcement learning, either for value function Lagoudakis and Parr(2003a); Riedmiller(2005) or policy representation Lagoudakis and Parr(2003b); Fern et al.(2004); Langford and Zadrozny (2005). Initial results have shown that policies can be approximately represented using either multi-class classifiers or combinations of binary classifiers Rexakis and Lagoudakis (2008) and, therefore, it is possible to incorporate classification algorithms within the inner loops of several reinforcement learning algorithms Lagoudakis and Parr (2003b); Fern et al. (2004). This viewpoint allows the quantification of the performance of reinforcement learning algorithms in terms of the performance of classification algorithms Langford and Zadrozny (2005). While a variety of promising combinations become possible through this synergy, heretofore there have been limited practical and widely-applicable algorithms. Our work builds on the work of Lagoudakis and Parr Lagoudakis and Parr (2003b) who suggested an approximate policy iteration algorithm for learning a good policy represented as a classifier, avoiding representations of any kind of value function. At each iteration, a new policy/classifier is produced using 1 training data obtained through extensive simulation (rollouts) of the previous policy on a generative model of the process. These rollouts aim at identifying better action choices over a subset of states in order to form a set of data for training the classifier representing the improved policy. A similar algorithm was proposed by Fern et al.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found