Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning