Review for NeurIPS paper: Preference-based Reinforcement Learning with Finite-Time Guarantees

Neural Information Processing Systems 

Weaknesses: There are two main weaknesses. First, I'm not sure whether the algorithm is meant to be the core contribution, or the analysis. If it's the algorithm, then the paper needs to actually test the algorithm in more than toy settings (and ideally with real humans, rather than simulating answers with BLT with two parameter settings). But if it's the analysis, I almost feel like the experiments are distracting, or at least overstating and drawing away from the main contributions. I'd love to hear the authors' perspective on this, but my suggestion would be to either a) get the best of both worlds by running a more serious experiment, or b) edit the paper to highlight the analysis and justify the experiments as showing what the algorithm does empirically and perhaps aiding with some qualitative analysis of the resulting behavior when applied to simple tasks, aiding in the understanding of the algorithm.