BMoreExperimentalSetups

Neural Information Processing Systems 

Example Reweightingdirectly assigns an importance weight to the standard CE training loss, accordingtothebiasdegreeβ: Lreweight = (1 β)y logpm (3) Confidence Regularizationis based on knowledge distillation [9]. It involves a teacher model trainedwiththestandardCEloss. Specifically, we calculate the weighted average of the F1 score of each class. The splits used for evaluation are highlightedwithredcolor. To address this problem, we select the best checkpoint after0.7 tmax of training, butstill according to the performance on the ID devset.