Are Domain Generalization Benchmarks with Accuracy on the Line Misspecified?

Open in new window