Self-Generated Critiques Boost Reward Modeling for Language Models

Open in new window