Smoothing Policies and Safe Policy Gradients

Papini, Matteo, Pirotta, Matteo, Restelli, Marcello

May-8-2019–arXiv.org Machine Learning

Policy gradient algorithms are among the best candidates for the much anticipated application of reinforcement learning to real-world control tasks, such as the ones arising in robotics. However, the trial-and-error nature of these methods introduces safety issues whenever the learning phase itself must be performed on a physical system. In this paper, we address a specific safety formulation, where danger is encoded in the reward signal and the learning agent is constrained to never worsen its performance. By studying actor-only policy gradient from a stochastic optimization perspective, we establish improvement guarantees for a wide class of parametric policies, generalizing existing results on Gaussian policies. This, together with novel upper bounds on the variance of policy gradient estimators, allows to identify those meta-parameter schedules that guarantee monotonic improvement with high probability. The two key meta-parameters are the step size of the parameter updates and the batch size of the gradient estimators. By a joint, adaptive selection of these meta-parameters, we obtain a safe policy gradient algorithm.

artificial intelligence, optimization problem, survey article, (20 more...)

arXiv.org Machine Learning

May-8-2019

arXiv.org PDF

Add feedback

Country:
- Europe > United Kingdom > England (0.14)

Genre:
- Research Report (0.40)

Industry:
- Leisure & Entertainment > Games (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning
    - Reinforcement Learning (1.00)
    - Statistical Learning (1.00)
  - Representation & Reasoning
    - Agents (1.00)
    - Optimization (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found