Batch Value Function Approximation via Support Vectors
Dietterich, Thomas G., Wang, Xin
–Neural Information Processing Systems
Virtually all existing work on value function approximation and policy-gradient methods starts with a parameterized formula for the value function or policy and thenseeks to find the best policythat canbe representedinthat parameterizedform. This can give rise to very difficult search problems for which the Bellman equation is of little or no use. In this paper, we take a different approach: rather than fixing the form of the function approximator and searching for a representable policy, we instead identify a good policy and then search for a function approximator that can represent it. Our approach exploits the ability of mathematical programming to represent a variety of constraints including those that derive from supervised learning, from advantage learning (Baird, 1993), and from the Bellman equation. By combining the kernel trick with mathematical programming, we obtain a function approximator that seeks to find the smallest number of support vectors sufficient to represent the desired policy.
Neural Information Processing Systems
Dec-31-2002