Dataset Featurization: Uncovering Natural Language Features through Unsupervised Data Reconstruction