Language Models can Evaluate Themselves via Probability Discrepancy

Open in new window