Fitted Q-Learning for Relational Domains

Das, Srijita, Natarajan, Sriraam, Roy, Kaushik, Parr, Ronald, Kersting, Kristian

Jun-9-2020–arXiv.org Artificial Intelligence

We take two specific approaches - first Value function approximation in Reinforcement Learning is to represent the lifted Q-value functions and the second (RL) has long been viewed using the lens of feature discovery is to represent the Bellman residuals - both using a set of (Parr et al. 2007). A set of classical approaches relational regression trees (RRTs) (Blockeel and De Raedt for this problem based on Approximate Dynamic Programming 1998). A key aspect of our approach is that it is model-free, (ADP) is the fitted value iteration algorithm (Boyan which most of the RMDP algorithms assume. The only exception and Moore 1995; Ernst, Geurts, and Wehenkel 2005; Riedmiller is Fern et al. (2006), who directly learn in policy 2005), a batch mode approximation scheme that employs space. Our work differs from their work in that we directly function approximators in each iteration to represent learn value functions and eventually policies from them the value estimates. Another popular class of methods that and adapt the most recently successful relational gradientboosting address this problem is Bellman error based methods (Menache, (RFGB) (Natarajan et al. 2014), which has been Mannor, and Shimkin 2005; Keller, Mannor, and Precup shown to outperform learning relational rules one by one.

artificial intelligence, iteration, reinforcement learning, (18 more...)

arXiv.org Artificial Intelligence

Jun-9-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.93)

Genre:
- Research Report > New Finding (0.46)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.93)
  - Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found