Review for NeurIPS paper: Calibrating Deep Neural Networks using Focal Loss

Neural Information Processing Systems 

Weaknesses: - It's not clear from the article if weight-decay was used for the experiments, on both the Cross Entropy and the Focal Loss. Weight-Decay has an non-negligeable effect on weight norms. The curves in the plot in Fig.2 e) would indicate the use of weight-decay but this is not mentioned in the text. Mind that some learning rate schedulers remove weight-decay for low learning rate values. Could the authors please clarify this aspect?