Batch Value Function Approximation via Support Vectors

Dec-31-2002–Neural Information Processing Systems

Virtually all existing work on value function approximation and policy-gradient methods starts with a parameterized formula for the value function or policy and thenseeks to find the best policythat canbe representedinthat parameterizedform. This can give rise to very difficult search problems for which the Bellman equation is of little or no use. In this paper, we take a different approach: rather than fixing the form of the function approximator and searching for a representable policy, we instead identify a good policy and then search for a function approximator that can represent it. Our approach exploits the ability of mathematical programming to represent a variety of constraints including those that derive from supervised learning, from advantage learning (Baird, 1993), and from the Bellman equation. By combining the kernel trick with mathematical programming, we obtain a function approximator that seeks to find the smallest number of support vectors sufficient to represent the desired policy.

artificial intelligence, formulation, fuzzy logic, (15 more...)

Neural Information Processing Systems

Dec-31-2002

Conferences PDF

Add feedback

Country:
- North America > United States > Oregon (0.15)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning > Support Vector Machines (0.72)
  - Representation & Reasoning > Uncertainty
    - Fuzzy Logic (0.63)

Duplicate Docs Excel Report

Title
Batch Value Function Approximation via Support Vectors
Batch Value Function Approximation via Support Vectors

Similar Docs Excel Report more

Title	Similarity	Source
None found