Gradient Descent for General Reinforcement Learning

Dec-31-1999–Neural Information Processing Systems

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcementlearning algorithms. These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function. In addition, it allows policysearch and value-based algorithms to be combined, thus unifying two very different approaches to reinforcement learning into a single Value and Policy Search (V APS) algorithm.

algorithm, probability, value function, (12 more...)

Neural Information Processing Systems

Dec-31-1999

Conferences PDF

Add feedback

Country:
- North America > United States
  - Pennsylvania > Allegheny County
    - Pittsburgh (0.15)
  - New York
    - New York County > New York City (0.05)
    - Monroe County > Rochester (0.04)
  - Massachusetts
    - Hampshire County > Amherst (0.14)
    - Suffolk County > Boston (0.04)
    - Middlesex County > Cambridge (0.04)
  - California > San Francisco County
    - San Francisco (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Statistical Learning > Gradient Descent (0.53)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.69)

Duplicate Docs Excel Report

Title
Gradient Descent for General Reinforcement Learning
Gradient Descent for General Reinforcement Learning

Similar Docs Excel Report more

Title	Similarity	Source
None found