UnifyingGradientEstimatorsforMeta-Reinforcement LearningviaOff-PolicyEvaluation
–Neural Information Processing Systems
This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates.
Neural Information Processing Systems
Feb-8-2026, 00:14:24 GMT
- Technology: