Policy Optimization for Continuous Reinforcement Learning