Re-evaluating Complex Backups in Temporal Difference Learning

Mar-15-2024, 14:54:38 GMT–Neural Information Processing Systems

We show that the λ-return target used in the TD(λ) family of algorithms is the maximum likelihood estimator for a specific model of how the variance of an n-step return estimate increases with n.

algorithm, estimator, n-step return, (16 more...)

Neural Information Processing Systems

Mar-15-2024, 14:54:38 GMT

Conferences PDF

Add feedback

Country:
- Asia > Singapore (0.04)
- North America > United States
  - Massachusetts
    - Middlesex County > Cambridge (0.14)
    - Hampshire County > Amherst (0.14)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Uncertainty
    - Bayesian Inference (0.35)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Learning Graphical Models > Directed Networks
      - Bayesian Learning (0.35)