First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning

Zhang, Yiming, Vuong, Quan, Ross, Keith W.

Feb-16-2020–arXiv.org Artificial Intelligence

In reinforcement learning, an agent attempts to learn high-performing behaviors through interacting with the environment, such behaviors are often quantified in the form of a reward function. However some aspects of behavior, such as ones which are deemed unsafe and are to be avoided, are best captured through constraints. We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints. Using data generated from the current policy, FOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. FOCOPS then projects the update policy back into the parametric policy space. Our approach provides a guarantee for constraint satisfaction throughout training and is first-order in nature therefore extremely simple to implement. We provide empirical evidence that our algorithm achieves better performance on a set of constrained robotics locomotive tasks compared to current state of the art approaches.

algorithm, constraint, order optimization, (14 more...)

arXiv.org Artificial Intelligence

Feb-16-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - New York (0.04)
  - California > San Diego County
    - San Diego (0.04)
  - Arizona > Maricopa County
    - Phoenix (0.04)
- Asia
  - Middle East > Jordan (0.04)
  - China > Shanghai
    - Shanghai (0.04)

Genre:
- Overview > Innovation (0.54)
- Research Report
  - New Finding (0.68)
  - Promising Solution (0.68)

Industry:
- Leisure & Entertainment > Games (0.46)
- Transportation (0.35)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning > Optimization (1.00)
  - Machine Learning > Reinforcement Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found