ProvablyFeedback-EfficientReinforcementLearning viaActiveRewardLearning

Neural Information Processing Systems 

Here H is the horizon oftheRL environment, anddimR specifies thecomplexity ofthefunction class representing the reward function.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found