Gradient Descent for General Reinforcement Learning

III, Leemon C. Baird, Moore, Andrew W.

Neural Information Processing Systems 

These algorithms all have guaranteed convergence, and include modifications of several existing algorithms that were known to fail to converge on simple MOPs. These include Q learning, SARSA, and advantage learning. In addition to these value-based algorithms it also generates pure policy-search reinforcement-learning algorithms, which learn optimal policies without learning a value function.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found