LearningtoConstrainPolicyOptimizationwith VirtualTrustRegion

Feb-9-2026, 00:07:39 GMT–Neural Information Processing Systems

ComparedtoDeepQ-learning,deeppolicygradient (PG) methods are often more flexible and applicable to discrete and continuous action problems. However, these methods tend to suffer from high sample complexity and training instability since the gradient may not accurately reflect the policy gain when the policy changes substantially [6].

artificial intelligence, machine learning, virtual policy, (16 more...)

Neural Information Processing Systems

Feb-9-2026, 00:07:39 GMT

Conferences PDF

Add feedback

Country:
- Oceania > Australia (0.14)

Technology:
- Information Technology > Artificial Intelligence
  - Machine Learning (1.00)
  - Robots (0.68)
  - Representation & Reasoning > Optimization (0.46)

Duplicate Docs Excel Report

Title
Learning to Constrain Policy Optimization with Virtual Trust Region

Similar Docs Excel Report more

Title	Similarity	Source
None found