Gradient Descent for General Reinforcement Learning
III, Leemon C. Baird, Moore, Andrew W.
–Neural Information Processing Systems
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms. These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policysearch and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (V APS) algorithm.
Neural Information Processing Systems
Dec-31-1999
- Country:
- North America > United States
- Pennsylvania > Allegheny County
- Pittsburgh (0.15)
- New York
- New York County > New York City (0.05)
- Monroe County > Rochester (0.04)
- Massachusetts
- Hampshire County > Amherst (0.14)
- Suffolk County > Boston (0.04)
- Middlesex County > Cambridge (0.04)
- California > San Francisco County
- San Francisco (0.14)
- Pennsylvania > Allegheny County
- North America > United States
- Technology: