Outcome-Driven Reinforcement Learning via Variational Inference

Rudner, Tim G. J., Pong, Vitchyr H., McAllister, Rowan, Gal, Yarin, Levine, Sergey

Apr-20-2021–arXiv.org Artificial Intelligence

While reinforcement learning algorithms provide automated acquisition of optimal policies, practical application of such methods requires a number of design decisions, such as manually designing reward functions that not only define the task, but also provide sufficient shaping to accomplish it. In this paper, we discuss a new perspective on reinforcement learning, recasting it as the problem of inferring actions that achieve desired outcomes, rather than a problem of maximizing rewards. To solve the resulting outcome-directed inference problem, we establish a novel variational inference formulation that allows us to derive a well-shaped reward function which can be learned directly from environment interactions. From the corresponding variational objective, we also derive a new probabilistic Bellman backup operator reminiscent of the standard Bellman backup operator and use it to develop an off-policy algorithm to solve goal-directed tasks. We empirically demonstrate that this method eliminates the need to design reward functions and leads to effective goal-directed behaviors.

equation, objective, outcome-driven reinforcement learning, (11 more...)

arXiv.org Artificial Intelligence

Apr-20-2021

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - California > Alameda County > Berkeley (0.14)
- Europe > United Kingdom
  - England > Oxfordshire > Oxford (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found