Gradient Descent for General Reinforcement Learning

Apr-6-2023, 17:34:35 GMT–Neural Information Processing Systems

A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement(cid:173) learning algorithms. These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q(cid:173) In addition to these learning, SARSA, and advantage learning. Simulations results are given, and several areas for future research are discussed.

algorithm, converge, general reinforcement learning, (8 more...)

Neural Information Processing Systems

Apr-6-2023, 17:34:35 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Statistical Learning > Gradient Descent (0.40)