Hindsight Expectation Maximization for Goal-conditioned Reinforcement Learning

Jun-12-2020–arXiv.org Machine Learning

We propose a graphical model framework for goal-conditioned RL, with an EM algorithm that operates on the lower bound of the RL objective. The E-step provides a natural interpretation of how 'learning in hindsight' techniques, such as HER, to handle extremely sparse goal-conditioned rewards. The M-step reduces policy optimization to supervised learning updates, which greatly stabilizes end-to-end training on high-dimensional inputs such as images. We show that the combined algorithm, hEM significantly outperforms model-free baselines on a wide range of goal-conditioned benchmarks with sparse rewards.

arxiv preprint arxiv, machine learning, reinforcement learning, (14 more...)

arXiv.org Machine Learning

Jun-12-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Illinois > Cook County
    - Chicago (0.04)
  - California > Monterey County
    - Pacific Grove (0.04)

Genre:
- Research Report (0.64)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Reinforcement Learning (1.00)
  - Statistical Learning (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found