Re-evaluating Complex Backups in Temporal Difference Learning
–Neural Information Processing Systems
We show that the λ-return target used in the TD(λ) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n.
Neural Information Processing Systems
Mar-15-2024, 14:54:38 GMT
- Country:
- North America > United States > Massachusetts
- Hampshire County > Amherst (0.14)
- Middlesex County > Cambridge (0.14)
- North America > United States > Massachusetts