OUI Need to Talk About Weight Decay: A New Perspective on Overfitting Detection

Fernández-Hernández, Alberto, Mestre, Jose I., Dolz, Manuel F., Duato, Jose, Quintana-Ortí, Enrique S.

arXiv.org Machine Learning 

--We introduce the Overfitting-Underfitting Indicator (OUI), a novel tool for monitoring the training dynamics of Deep Neural Networks (DNNs) and identifying optimal regularization hyperparameters. Specifically, we validate that OUI can effectively guide the selection of the Weight Decay (WD) hyperparameter by indicating whether a model is overfitting or underfitting during training without requiring validation data. Through experiments on DenseNet-BC-100 with CIF AR-100, EfficientNet-B0 with TinyImageNet and ResNet-34 with ImageNet-1K, we show that maintaining OUI within a prescribed interval correlates strongly with improved generalization and validation scores. Notably, OUI converges significantly faster than traditional metrics such as loss or accuracy, enabling practitioners to identify optimal WD (hyperparameter) values within the early stages of training. By leveraging OUI as a reliable indicator, we can determine early in training whether the chosen WD value leads the model to underfit the training data, overfit, or strike a well-balanced trade-off that maximizes validation scores. This enables more precise WD tuning for optimal performance on the tested datasets and DNNs. The challenge of overfitting in training DNNs has become increasingly pronounced, fueled by the overparameterization characteristic of many state-of-the-art architectures. Although DNNs with strong expressive power [1]-[3]-- i.e., the ability to approximate arbitrarily complex functions with increasing precision--hold the promise of exceptional performance in terms of validation scores, they often exploit this by memorizing specific details of the training set that are not related Manuel F. Dolz was supported by the Plan Gen-T grant CIDEXG/2022/013 of the Generalitat V alenciana. This misdirection undermines the DNN's ability to generalize, resulting in a significant gap between training and validation scores. To address this problem, regularization techniques have emerged as essential tools in modern Deep Learning (DL) [4], [5]. Indeed, understanding and enhancing generalization has become a central focus of contemporary research, as highlighted by works such as [6] and [7].

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found