Estimating Q(s,s') with Deep Deterministic Dynamics Gradients

Edwards, Ashley D., Sahni, Himanshu, Liu, Rosanne, Hung, Jane, Jain, Ankit, Wang, Rui, Ecoffet, Adrien, Miconi, Thomas, Isbell, Charles, Yosinski, Jason

Feb-21-2020–arXiv.org Artificial Intelligence

In this paper, we introduce a novel form of value function, $Q(s, s')$, that expresses the utility of transitioning from a state $s$ to a neighboring state $s'$ and then acting optimally thereafter. In order to derive an optimal policy, we develop a forward dynamics model that learns to make next-state predictions that maximize this value. This formulation decouples actions from values while still learning off-policy. We highlight the benefits of this approach in terms of value function transfer, learning within redundant action spaces, and learning off-policy from state observations generated by sub-optimal or completely random policies. Code and videos are available at \url{sites.google.com/view/qss-paper}.

dynamic model, experiment, qss, (10 more...)

arXiv.org Artificial Intelligence

Feb-21-2020

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > Jordan (0.04)

Genre:
- Research Report (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Robots (1.00)
  - Machine Learning
    - Reinforcement Learning (0.72)
    - Neural Networks (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found