LearningtoConstrainPolicyOptimizationwith VirtualTrustRegion
–Neural Information Processing Systems
ComparedtoDeepQ-learning,deeppolicygradient (PG) methods are often more flexible and applicable to discrete and continuous action problems. However, these methods tend to suffer from high sample complexity and training instability since the gradient may not accurately reflect the policy gain when the policy changes substantially [6].
Neural Information Processing Systems
Feb-9-2026, 00:07:39 GMT
- Technology:
- Information Technology > Artificial Intelligence
- Machine Learning (1.00)
- Representation & Reasoning > Optimization (0.46)
- Robots (0.68)
- Information Technology > Artificial Intelligence