UnderstandingtheEffectofStochasticity inPolicyOptimization

Neural Information Processing Systems 

Until recently it had generally been assumed thatmethods based onfollowingthepolicygradient (PG)[1]could notbeguaranteed toconverge to globally optimal solutions, given that the policy value function is not concave.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found