Evaluating Large Language Models for Generalization and Robustness via Data Compression

Open in new window