Exploring TD error as a heuristic for $\sigma$ selection in Q($\sigma$, $\lambda$)

Open in new window