Self-Consistency of the Internal Reward Models Improves Self-Rewarding Language Models

Open in new window