Temporal-difference learning for nonlinear value function approximation in the lazy training regime

Open in new window