Goto

Collaborating Authors

 superloss




2cfa8f9e50e0f510ede9d12338a5f564-AuthorFeedback.pdf

Neural Information Processing Systems

We thank the reviewers for their feedback. Our'formulation is generic and task-agnostic and therefore has the potential'The model simplifies existing work' ( R1) and'has been applied to many loss functions and tasks without any change'The experiments cover different tasks and benchmark datasets' ( R3). 'It is misleading to claim that the paper is the first work using task-agnostic weights that do not require iterative W e do not make such a claim . We believe a simple and easy-to-use idea has potential for great impact. We review (in Section 2.1 and Section 1 from the supplementary) We therefore propose in Section 2.2 the Section 2.3); (2) handle both positive-and negative-valued losses (which justifies the squared regularizer log term'Does not brings notably new criteria in determining the sample weights' (R3.3). 'SuperLoss does not show an advantage on clean data' (R3.4).


SuperLoss: A Generic Loss for Robust Curriculum Learning

Neural Information Processing Systems

Curriculum learning is a technique to improve a model performance and generalization based on the idea that easy samples should be presented before difficult ones during training. While it is generally complex to estimate a priori the difficulty of a given sample, recent works have shown that curriculum learning can be formulated dynamically in a self-supervised manner. The key idea is to somehow estimate the importance (or weight) of each sample directly during training based on the observation that easy and hard samples behave differently and can therefore be separated. However, these approaches are usually limited to a specific task (e.g., classification) and require extra data annotations, layers or parameters as well as a dedicated training procedure. We propose instead a simple and generic method that can be applied to a variety of losses and tasks without any change in the learning procedure. It consists in appending a novel loss function on top of any existing task loss, hence its name: the SuperLoss. Its main effect is to automatically downweight the contribution of samples with a large loss, i.e. hard samples, effectively mimicking the core principle of curriculum learning. As a side effect, we show that our loss prevents the memorization of noisy samples, making it possible to train from noisy data even with non-robust loss functions. Experimental results on image classification, regression, object detection and image retrieval demonstrate consistent gain, particularly in the presence of noise.





Review for NeurIPS paper: SuperLoss: A Generic Loss for Robust Curriculum Learning

Neural Information Processing Systems

Additional Feedback: Further comments: - The definition of hard and easy examples is limited to their respective confidence scores or losses. Although previous work has similar definitions, confidence or loss are not always good indicators of true easiness or hardness of samples, e.g. they could be erroneous at early iterations. The paper lacks an experiment that illustrates the validity of the above definition. These are probably hard or noisy examples that were mistreated as easy examples by the model? These are probably a mixture of easy, hard, and noisy examples with low confidence across the loss spectrum that were mistreated as hard examples by the model.


SuperLoss: A Generic Loss for Robust Curriculum Learning

Neural Information Processing Systems

Curriculum learning is a technique to improve a model performance and generalization based on the idea that easy samples should be presented before difficult ones during training. While it is generally complex to estimate a priori the difficulty of a given sample, recent works have shown that curriculum learning can be formulated dynamically in a self-supervised manner. The key idea is to somehow estimate the importance (or weight) of each sample directly during training based on the observation that easy and hard samples behave differently and can therefore be separated. However, these approaches are usually limited to a specific task (e.g., classification) and require extra data annotations, layers or parameters as well as a dedicated training procedure. We propose instead a simple and generic method that can be applied to a variety of losses and tasks without any change in the learning procedure.