ProvablyFeedback-EfficientReinforcementLearning viaActiveRewardLearning
–Neural Information Processing Systems
Here H is the horizon oftheRL environment, anddimR specifies thecomplexity ofthefunction class representing the reward function.
Neural Information Processing Systems
Feb-8-2026, 17:14:21 GMT
- Technology: