Evaluating Large Language Models for Generalization and Robustness via Data Compression