Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning

Open in new window