Preference-based ReinforcementLearning withFinite-TimeGuarantees