Preference-based ReinforcementLearning withFinite-TimeGuarantees
–Neural Information Processing Systems
Wefirstshowthataunique optimal policymaynot exist if preferences over trajectories are deterministic for PbRL.
Neural Information Processing Systems
Feb-10-2026, 16:31:07 GMT
- Country:
- Technology: