Fitted Q-Learning for Relational Domains

Das, Srijita, Natarajan, Sriraam, Roy, Kaushik, Parr, Ronald, Kersting, Kristian

arXiv.org Artificial Intelligence 

We take two specific approaches - first Value function approximation in Reinforcement Learning is to represent the lifted Q-value functions and the second (RL) has long been viewed using the lens of feature discovery is to represent the Bellman residuals - both using a set of (Parr et al. 2007). A set of classical approaches relational regression trees (RRTs) (Blockeel and De Raedt for this problem based on Approximate Dynamic Programming 1998). A key aspect of our approach is that it is model-free, (ADP) is the fitted value iteration algorithm (Boyan which most of the RMDP algorithms assume. The only exception and Moore 1995; Ernst, Geurts, and Wehenkel 2005; Riedmiller is Fern et al. (2006), who directly learn in policy 2005), a batch mode approximation scheme that employs space. Our work differs from their work in that we directly function approximators in each iteration to represent learn value functions and eventually policies from them the value estimates. Another popular class of methods that and adapt the most recently successful relational gradientboosting address this problem is Bellman error based methods (Menache, (RFGB) (Natarajan et al. 2014), which has been Mannor, and Shimkin 2005; Keller, Mannor, and Precup shown to outperform learning relational rules one by one.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found