Re-evaluating Complex Backups in Temporal Difference Learning

Neural Information Processing Systems 

We show that the λ-return target used in the TD(λ) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n.