Approximate Policy Iteration with a Policy Language Bias
Fern, Alan, Yoon, Sungwook, Givan, Robert
–Neural Information Processing Systems
We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs.
Neural Information Processing Systems
Dec-31-2004