First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning
Zhang, Yiming, Vuong, Quan, Ross, Keith W.
–arXiv.org Artificial Intelligence
In reinforcement learning, an agent attempts to learn high-performing behaviors through interacting with the environment, such behaviors are often quantified in the form of a reward function. However some aspects of behavior, such as ones which are deemed unsafe and are to be avoided, are best captured through constraints. We propose a novel approach called First Order Constrained Optimization in Policy Space (FOCOPS) which maximizes an agent's overall reward while ensuring the agent satisfies a set of cost constraints. Using data generated from the current policy, FOCOPS first finds the optimal update policy by solving a constrained optimization problem in the nonparameterized policy space. FOCOPS then projects the update policy back into the parametric policy space. Our approach provides a guarantee for constraint satisfaction throughout training and is first-order in nature therefore extremely simple to implement. We provide empirical evidence that our algorithm achieves better performance on a set of constrained robotics locomotive tasks compared to current state of the art approaches.
arXiv.org Artificial Intelligence
Feb-16-2020
- Country:
- North America > United States
- New York (0.04)
- California > San Diego County
- San Diego (0.04)
- Arizona > Maricopa County
- Phoenix (0.04)
- Asia
- Middle East > Jordan (0.04)
- China > Shanghai
- Shanghai (0.04)
- North America > United States
- Genre:
- Overview > Innovation (0.54)
- Research Report
- New Finding (0.68)
- Promising Solution (0.68)
- Industry:
- Leisure & Entertainment > Games (0.46)
- Transportation (0.35)
- Technology: