Gradient Descent for General Reinforcement Learning

Dec-31-1999–Neural Information Processing Systems

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms.These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policysearch andvalue-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (V APS) algorithm.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

Neural Information Processing Systems

Dec-31-1999

Conferences PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.15)
  - Massachusetts > Hampshire County
    - Amherst (0.14)
  - California > San Francisco County
    - San Francisco (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.95)

Duplicate Docs Excel Report

Title
Gradient Descent for General Reinforcement Learning
Gradient Descent for General Reinforcement Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found