Review for NeurIPS paper: Self-Adaptive Training: beyond Empirical Risk Minimization

Neural Information Processing Systems 

Weaknesses: The main weakness of the proposed approached is that it is not supported by any theoretical insight. In particular, the success of the method hinges on the premise that the model is able to guess the right predictions so as to correct the noisy labels. Since there is no theoretical criterion to verify that premise, it is not possible to predict whether this proposed method will work well on new learning tasks. Going further, one can imagine cases where this method would fail and actually perform worse than ERM. For instance, if the model is unable to capture sufficient information from the data distribution (for instance if the data distribution is very complex and / or if there are too few training samples and / or if the model does not have sufficient capacity), it would be impossible for the model to "bootstrap" its own predictions and guess the correct labels.