Kernel-Based Reinforcement Learning on Representative States
Kveton, Branislav (Technicolor Labs) | Theocharous, Georgios (Yahoo Labs)
Markov decision processes (MDPs) are an established framework for solving sequential decision-making problems under uncertainty. In this work, we propose a new method for batch-mode reinforcement learning (RL) with continuous state variables. The method is an approximation to kernel-based RL on a set of k representative states. Similarly to kernel-based RL, our solution is a fixed point of a kernelized Bellman operator and can approximate the optimal solution to an arbitrary level of granularity. Unlike kernel-based RL, our method is fast. In particular, our policies can be computed in O ( n ) time, where n is the number of training examples. The time complexity of kernel-based RL is Ω( n 2 ). We introduce our method, analyze its convergence, and compare it to existing work. The method is evaluated on two existing control problems with 2 to 4 continuous variables and a new problem with 64 variables. In all cases, we outperform state-of-the-art results and offer simpler solutions.
Jul-21-2012
- Country:
- North America
- Canada > Ontario
- Toronto (0.14)
- United States
- California > Santa Clara County (0.14)
- Massachusetts > Middlesex County (0.14)
- Canada > Ontario
- North America
- Genre:
- Overview (0.66)
- Research Report (0.47)