CREAM: Consistency Regularized Self-Rewarding Language Models

Open in new window