Multiple-Step Greedy Policies in Approximate and Online Reinforcement Learning

Yonathan Efroni, Gal Dalal, Bruno Scherrer, Shie Mannor

Neural Information Processing Systems 

Neural Information Processing Systems http://nips.cc/