Preference-based ReinforcementLearning withFinite-TimeGuarantees

Neural Information Processing Systems 

Wefirstshowthataunique optimal policymaynot exist if preferences over trajectories are deterministic for PbRL.

Similar Docs  Excel Report  more

TitleSimilaritySource
None found