Review for NeurIPS paper: Preference-based Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing Systems 

This paper generated considerable discussion among the reviewers. One the positive side, this paper makes a solid contribution to the emerging literature on preference-based RL, a topic of some importance and makes some interesting insights (e.g., on the potential lack of a "winning policy") and novel algorithmic contributions. Conversely, some reviewers raised issues with some of the assumptions made in the paper and the presentation (which seems to assume familiarity with PBRL and its motivations/rationale. The author response was thoughtful and generated some discussion (some of which is not reflected in the reviews, a couple of which failed to get updated unfortunately). On my own reading if the paper, I agree that the paper makes a useful contribution to PBRL, especially from a technical perspective and conceptual perspective (although I don't believe it makes PBRL more practical at this stage).