Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Neural Information Processing Systems 

Our framework is the infinite-horizon discounted Markov Decision Process (MDP).

Similar Docs  Excel Report  more

TitleSimilaritySource
None found