Measuring Value Understanding in Language Models through Discriminator-Critique Gap

Open in new window