Expert-Supervised ReinforcementLearningfor OfflinePolicyLearningandEvaluation
–Neural Information Processing Systems
Sample efficiencyof ESRL is independent of the chosen risk aversion threshold and quality of the behavior policy.
Neural Information Processing Systems
Feb-19-2026, 07:58:40 GMT
- Country:
- Technology: