HowtoLearnaUsefulCritic?Model-based Action-Gradient-EstimatorPolicyOptimization

Neural Information Processing Systems 

However, instead of gradients, the critic is, typically, only trained to accurately predict expected returns, which, on their own, are useless for policy optimization.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found