Reviews: Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels

Oct-8-2024, 09:26:37 GMT–Neural Information Processing Systems

The key insight comes from analyzing the loss function gradients: they are equivalent, except that CCE includes a term that implicitly assigns higher weights to incorrect predictions. This makes training with CCE faster than with MAE but also makes it more susceptible to overfitting label noise. Like CCE, the gradient of Lq loss yields a weighting term but with an exponent parameter that we can choose. When q 0, we get CCE, and when q 1, the weighting term disappears, which is equivalent to MAE. The paper shows that a variant of a known risk bound for MAE under uniform label noise applies to Lq loss as q approaches 1. Experimental results are noticeably strong: Lq consistently outperforms CCE and MAE both and is competitive with several alternative strong baselines.

lq loss, noisy label, training deep neural network, (6 more...)

Neural Information Processing Systems

Oct-8-2024, 09:26:37 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.41)