Gradient Descent for General Reinforcement Learning
–Neural Information Processing Systems
A simple learning rule is derived, the VAPS algorithm, which can be instantiated to generate a wide range of new reinforcement(cid:173) learning algorithms. These algorithms solve a number of open problems, define several new approaches to reinforcement learning, and unify different approaches to reinforcement learning under a single theory. These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q(cid:173) In addition to these learning, SARSA, and advantage learning. Simulations results are given, and several areas for future research are discussed.
Neural Information Processing Systems
Apr-6-2023, 17:34:35 GMT
- Technology: