Constraint-Conditioned Policy Optimization for Versatile Safe Reinforcement Learning

Neural Information Processing Systems 

CCPO can generalize to diverse unseen constraint thresholds without retraining the policy.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found