Policy Optimization via Importance Sampling

Metelli, Alberto Maria, Papini, Matteo, Faccio, Francesco, Restelli, Marcello

Sep-17-2018–arXiv.org Machine Learning

Policy optimization is an effective reinforcement learning approach to solve continuous control tasks. Recent achievements have shown that alternating on-line and off-line optimization is a successful choice for efficient trajectory reuse. However, deciding when to stop optimizing and collect new trajectories is non-trivial as it requires to account for the variance of the objective function estimate. In this paper, we propose a novel model-free policy search algorithm, POIS, applicable in both control-based and parameter-based settings. We first derive a high-confidence bound for importance sampling estimation and then we define a surrogate objective function which is optimized off-line using a batch of trajectories. Finally, the algorithm is tested on a selection of continuous control tasks, with both linear and deep policies, and compared with the state-of-the-art policy optimization methods.

neural network, optimization problem, variance, (20 more...)

arXiv.org Machine Learning

Sep-17-2018

arXiv.org PDF

Add feedback

Country:
- Europe (0.46)

Genre:
- Research Report (0.70)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Neural Networks (1.00)
    - Reinforcement Learning (0.87)
  - Representation & Reasoning > Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found