UnifyingGradientEstimatorsforMeta-Reinforcement LearningviaOff-PolicyEvaluation

Neural Information Processing Systems 

This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found