Bootstrapping the Expressivity with Model-based Planning

Oct-14-2019–arXiv.org Artificial Intelligence

We compare the model-free reinforcement learning with the model-based approaches through the lens of the expressive power of neural networks for policies, $Q$-functions, and dynamics. We show, theoretically and empirically, that even for one-dimensional continuous state space, there are many MDPs whose optimal $Q$-functions and policies are much more complex than the dynamics. We hypothesize many real-world MDPs also have a similar property. For these MDPs, model-based planning is a favorable algorithm, because the resulting policies can approximate the optimal policy significantly better than a neural network parameterization can, and model-free or model-based policy optimization rely on policy parameterization. Motivated by the theory, we apply a simple multi-step model-based bootstrapping planner (BOOTS) to bootstrap a weak $Q$-function into a stronger policy. Empirical results show that applying BOOTS on top of model-based or model-free policy optimization algorithms at the test time improves the performance on MuJoCo benchmark tasks.

algorithm, neural network, q-function, (15 more...)

arXiv.org Artificial Intelligence

Oct-14-2019

arXiv.org PDF

Add feedback

Country:
- North America
  - Canada (0.04)
  - United States
    - Louisiana > Orleans Parish
      - New Orleans (0.04)
    - California > Santa Clara County
      - Palo Alto (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report > New Finding (0.66)

Industry:
- Leisure & Entertainment (0.67)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (1.00)
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Neural Networks (1.00)
    - Performance Analysis > Accuracy (0.61)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found