Goto

Collaborating Authors

 catastrophic


Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting

Neural Information Processing Systems

The practical success of overparameterized neural networks has motivated the recent scientific study of \emph{interpolating methods}-- learning methods which are able fit their training data perfectly. Empirically, certain interpolating methods can fit noisy training data without catastrophically bad test performance, which defies standard intuitions from statistical learning theory. Aiming to explain this, a large body of recent work has studied \emph{benign overfitting}, a behavior seen in certain asymptotic settings under which interpolating methods approach Bayes-optimality, even in the presence of noise. In this work, we argue that, while benign overfitting has been instructive to study, real interpolating methods like deep networks do not fit benignly. That is, noise in the train set leads to suboptimal generalization, suggesting that these methods fall in an intermediate regime between benign and catastrophic overfitting, in which asymptotic risk is neither is neither Bayes-optimal nor unbounded, with the confounding effect of the noise being ``tempered but non-negligible. We call this behavior \textit{tempered overfitting}. We first provide broad empirical evidence for our three-part taxonomy, demonstrating that deep neural networks and kernel machines fit to noisy data can be reasonably well classified as benign, tempered, or catastrophic. We then specialize to kernel (ridge) regression (KR), obtaining conditions on the ridge parameter and kernel eigenspectrum under which KR exhibits each of the three behaviors, demonstrating the consequences for KR with common kernels and trained neural networks of infinite width using experiments on natural and synthetic datasets.


Continual Learning for Adaptive AI Systems

Amin, Md Hasibul, Alam, Tamzid Tanvi

arXiv.org Artificial Intelligence

Continual learning the ability of a neural network to learn multiple sequential tasks without catastrophic forgetting remains a central challenge in developing adaptive artificial intelligence systems. While deep learning models achieve state-of-the-art performance across domains, they remain limited by overfitting and forgetting. This paper introduces Cluster-Aware Replay (CAR), a hybrid continual learning framework that integrates a small, class-balanced replay buffer with a regularization term based on Inter-Cluster Fitness (ICF) in the feature space. The ICF loss penalizes overlapping feature representations between new and previously learned tasks, encouraging geometric separation in the latent space and reducing interference. Using the standard five-task Split CIFAR-10 benchmark with a ResNet-18 backbone, initial experiments demonstrate that CAR better preserves earlier task performance compared to fine-tuning alone. These findings are preliminary but highlight feature-space regularization as a promising direction for mitigating catastrophic forgetting.


A Defect Classification Framework for AI-Based Software Systems (AI-ODC)

Alannsary, Mohammed O.

arXiv.org Artificial Intelligence

Artificial Intelligence has gained a lot of attention recently, it has been utilized in several fields ranging from daily life activities, such as responding to emails and scheduling appointments, to manufacturing and automating work activities. Artificial Intelligence systems are mainly implemented as software solutions, and it is essential to discover and remove software defects to assure its quality using defect analysis which is one of the major activities that contribute to software quality. Despite the proliferation of AI-based systems, current defect analysis models fail to capture their unique attributes. This paper proposes a framework inspired by the Orthogonal Defect Classification (ODC) paradigm and enables defect analysis of Artificial Intelligence systems while recognizing its special attributes and characteristics. This study demonstrated the feasibility of modifying ODC for AI systems to classify its defects. The ODC was adjusted to accommodate the Data, Learning, and Thinking aspects of AI systems which are newly introduced classification dimensions. This adjustment involved the introduction of an additional attribute to the ODC attributes, the incorporation of a new severity level, and the substitution of impact areas with characteristics pertinent to AI systems. The framework was showcased by applying it to a publicly available Machine Learning bug dataset, with results analyzed through one-way and two-way analysis. The case study indicated that defects occurring during the Learning phase were the most prevalent and were significantly linked to high-severity classifications. In contrast, defects identified in the Thinking phase had a disproportionate effect on trustworthiness and accuracy. These findings illustrate AIODC's capability to identify high-risk defect categories and inform focused quality assurance measures.


Benign, Tempered, or Catastrophic: Toward a Refined Taxonomy of Overfitting

Neural Information Processing Systems

The practical success of overparameterized neural networks has motivated the recent scientific study of \emph{interpolating methods}-- learning methods which are able fit their training data perfectly. Empirically, certain interpolating methods can fit noisy training data without catastrophically bad test performance, which defies standard intuitions from statistical learning theory. Aiming to explain this, a large body of recent work has studied \emph{benign overfitting}, a behavior seen in certain asymptotic settings under which interpolating methods approach Bayes-optimality, even in the presence of noise. In this work, we argue that, while benign overfitting has been instructive to study, real interpolating methods like deep networks do not fit benignly. That is, noise in the train set leads to suboptimal generalization, suggesting that these methods fall in an intermediate regime between benign and catastrophic overfitting, in which asymptotic risk is neither is neither Bayes-optimal nor unbounded, with the confounding effect of the noise being tempered" but non-negligible.


Challenges and Contributing Factors in the Utilization of Large Language Models (LLMs)

Chen, Xiaoliang, Li, Liangbin, Chang, Le, Huang, Yunhe, Zhao, Yuxuan, Zhang, Yuxiao, Li, Dinuo

arXiv.org Artificial Intelligence

With the development of large language models (LLMs) like the GPT series, their widespread use across various application scenarios presents a myriad of challenges. This review initially explores the issue of domain specificity, where LLMs may struggle to provide precise answers to specialized questions within niche fields. The problem of knowledge forgetting arises as these LLMs might find it hard to balance old and new information. The knowledge repetition phenomenon reveals that sometimes LLMs might deliver overly mechanized responses, lacking depth and originality. Furthermore, knowledge illusion describes situations where LLMs might provide answers that seem insightful but are actually superficial, while knowledge toxicity focuses on harmful or biased information outputs. These challenges underscore problems in the training data and algorithmic design of LLMs. To address these issues, it's suggested to diversify training data, fine-tune models, enhance transparency and interpretability, and incorporate ethics and fairness training. Future technological trends might lean towards iterative methodologies, multimodal learning, model personalization and customization, and real-time learning and feedback mechanisms. In conclusion, future LLMs should prioritize fairness, transparency, and ethics, ensuring they uphold high moral and ethical standards when serving humanity.