LearningtoConstrainPolicyOptimizationwith VirtualTrustRegion

Neural Information Processing Systems 

ComparedtoDeepQ-learning,deeppolicygradient (PG) methods are often more flexible and applicable to discrete and continuous action problems. However, these methods tend to suffer from high sample complexity and training instability since the gradient may not accurately reflect the policy gain when the policy changes substantially [6].

Similar Docs  Excel Report  more

TitleSimilaritySource
None found