Self-training Avoids Using Spurious Features Under Domain Shift Yining Chen, Colin Wei

Neural Information Processing Systems 

For this setting, we prove that entropy minimization on unlabeled target data will avoid using the spurious feature if initialized with a decently accurate source classifier, even though the objective is non-convex and contains multiple bad local minima using the spurious features.