Review for NeurIPS paper: Learning to summarize with human feedback
–Neural Information Processing Systems
Weaknesses: However, I have two major concerns: 1. As also mentioned by the authors, this paper is basically an expanded analysis of [3, 58]. Basically, the key techniques of classification-based reward and PPO have been explored in [58], and the major extension is that this paper uses a larger and better-engineered model, and adapts an online setting to the offline setting. Therefore, I feel this paper has very little novelty in the sense of machine learning. The authors are very honest about this in the Related Work (Line 86), though.
Neural Information Processing Systems
May-28-2025, 16:22:42 GMT
- Technology: