We are glad that all reviewers appreciated the soundness of our work, the importance of the hidden stratification (HS)

Neural Information Processing Systems 

ERM model to obtain a feature representation and then trains a second, robust model. With tuning of learning rate schedules and other hyperparameters (HPs), GEORGE's cost could be further reduced. D.4, we define "inherent hardness" as the minimum possible worst-case subclass We hope that building on this method may also be of independent interest. Our results are fairly insensitive (no significant performance drop) to reasonable variation in these HPs. Additional classification metrics (ISIC omitted for space).