catastrophic interference
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > Dominican Republic (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education (1.00)
- Information Technology (0.67)
Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit behavior, recovering from the forgetting on documents seeing them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments.
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- North America > Dominican Republic (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Education (1.00)
- Information Technology (0.67)
Understanding Catastrophic Interference: On the Identifibility of Latent Representations
Li, Yuke, Zheng, Yujia, Xiong, Tianyi, Wang, Zhenyi, Huang, Heng
Catastrophic interference, also known as catastrophic forgetting, is a fundamental challenge in machine learning, where a trained learning model progressively loses performance on previously learned tasks when adapting to new ones. In this paper, we aim to better understand and model the catastrophic interference problem from a latent representation learning point of view, and propose a novel theoretical framework that formulates catastrophic interference as an identification problem. Our analysis demonstrates that the forgetting phenomenon can be quantified by the distance between partial-task aware (PTA) and all-task aware (ATA) setups. Building upon recent advances in identifiability theory, we prove that this distance can be minimized through identification of shared latent variables between these setups. When learning, we propose our method \ourmeos with two-stage training strategy: First, we employ maximum likelihood estimation to learn the latent representations from both PTA and ATA configurations. Subsequently, we optimize the KL divergence to identify and learn the shared latent variables. Through theoretical guarantee and empirical validations, we establish that identifying and learning these shared representations can effectively mitigate catastrophic interference in machine learning systems. Our approach provides both theoretical guarantees and practical performance improvements across both synthetic and benchmark datasets.
- North America > United States > Maryland > Prince George's County > College Park (0.14)
- North America > United States > Florida > Orange County > Orlando (0.14)
- Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Reawakening knowledge: Anticipatory recovery from catastrophic interference via structured training
We explore the training dynamics of neural networks in a structured non-IID setting where documents are presented cyclically in a fixed, repeated sequence. Typically, networks suffer from catastrophic interference when training on a sequence of documents; however, we discover a curious and remarkable property of LLMs finetuned sequentially in this setting: they exhibit anticipatory behavior, recovering from the forgetting on documents before seeing them again. The behavior emerges and becomes more robust as the architecture scales up its number of parameters. Through comprehensive experiments and visualizations, we uncover new insights into training over-parameterized networks in structured environments.
TACOS: Task Agnostic Continual Learning in Spiking Neural Networks
Soures, Nicholas, Helfer, Peter, Daram, Anurag, Pandit, Tej, Kudithipudi, Dhireesha
Catastrophic interference, the loss of previously learned information when learning new information, remains a major challenge in machine learning. Since living organisms do not seem to suffer from this problem, researchers have taken inspiration from biology to improve memory retention in artificial intelligence systems. However, previous attempts to use bio-inspired mechanisms have typically resulted in systems that rely on task boundary information during training and/or explicit task identification during inference, information that is not available in real-world scenarios. Here, we show that neuro-inspired mechanisms such as synaptic consolidation and metaplasticity can mitigate catastrophic interference in a spiking neural network, using only synapse-local information, with no need for task awareness, and with a fixed memory size that does not need to be increased when training on new tasks. Our model, TACOS, combines neuromodulation with complex synaptic dynamics to enable new learning while protecting previous information. We evaluate TACOS on sequential image recognition tasks and demonstrate its effectiveness in reducing catastrophic interference. Our results show that TACOS outperforms existing regularization techniques in domain-incremental learning scenarios. We also report the results of an ablation study to elucidate the contribution of each neuro-inspired mechanism separately.
- North America > United States > Texas > Bexar County > San Antonio (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
One system for learning and remembering episodes and rules
Hewson, Joshua T. S., Sloman, Sabina J., Dubova, Marina
Humans can learn individual episodes and generalizable rules and also successfully retain both kinds of acquired knowledge over time. In the cognitive science literature, (1) learning individual episodes and rules and (2) learning and remembering are often both conceptualized as competing processes that necessitate separate, complementary learning systems. Inspired by recent research in statistical learning, we challenge these trade-offs, hypothesizing that they arise from capacity limitations rather than from the inherent incompatibility of the underlying cognitive processes. Using an associative learning task, we show that one system with excess representational capacity can learn and remember both episodes and rules.
- North America > United States > Rhode Island > Providence County > Providence (0.05)
- North America > United States > Indiana > Monroe County > Bloomington (0.05)
- Europe > United Kingdom > England > Greater Manchester > Manchester (0.05)
Exploring a Cognitive Architecture for Learning Arithmetic Equations
The acquisition and performance of arithmetic skills and basic operations such as addition, subtraction, multiplication, and division are essential for daily functioning, and reflect complex cognitive processes. This paper explores the cognitive mechanisms powering arithmetic learning, presenting a neurobiologically plausible cognitive architecture that simulates the acquisition of these skills. I implement a number vectorization embedding network and an associative memory model to investigate how an intelligent system can learn and recall arithmetic equations in a manner analogous to the human brain. I perform experiments that provide insights into the generalization capabilities of connectionist models, neurological causes of dyscalculia, and the influence of network architecture on cognitive performance. Through this interdisciplinary investigation, I aim to contribute to ongoing research into the neural correlates of mathematical cognition in intelligent systems.
- North America > United States > California (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Health & Medicine > Therapeutic Area > Neurology (1.00)
- Education > Curriculum > Subject-Specific Education (0.72)
Distal Interference: Exploring the Limits of Model-Based Continual Learning
van Deventer, Heinrich, Bosman, Anna Sergeevna
Continual learning is the sequential learning of different tasks by a machine learning model. Continual learning is known to be hindered by catastrophic interference or forgetting, i.e. rapid unlearning of earlier learned tasks when new tasks are learned. Despite their practical success, artificial neural networks (ANNs) are prone to catastrophic interference. This study analyses how gradient descent and overlapping representations between distant input points lead to distal interference and catastrophic interference. Distal interference refers to the phenomenon where training a model on a subset of the domain leads to non-local changes on other subsets of the domain. This study shows that uniformly trainable models without distal interference must be exponentially large. A novel antisymmetric bounded exponential layer B-spline ANN architecture named ABEL-Spline is proposed that can approximate any continuous function, is uniformly trainable, has polynomial computational complexity, and provides some guarantees for distal interference. Experiments are presented to demonstrate the theoretical properties of ABEL-Splines. ABEL-Splines are also evaluated on benchmark regression problems. It is concluded that the weaker distal interference guarantees in ABEL-Splines are insufficient for model-only continual learning. It is conjectured that continual learning with polynomial complexity models requires augmentation of the training data or algorithm.
- Africa > South Africa > Gauteng > Pretoria (0.04)
- North America > United States > New York (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- (2 more...)
Toward a More Biologically Plausible Neural Network Model of Latent Cause Inference
Lu, Qihong, Nguyen, Tan T., Zhang, Qiong, Hasson, Uri, Griffiths, Thomas L., Zacks, Jeffrey M., Gershman, Samuel J., Norman, Kenneth A.
Humans spontaneously perceive a continuous stream of experience as discrete events. It has been hypothesized that this ability is supported by latent cause inference (LCI). We implemented this hypothesis using Latent Cause Network (LCNet), a neural network model of LCI. LCNet interacts with a Bayesian LCI mechanism that activates a unique context vector for each inferred latent cause. This architecture makes LCNet more biologically plausible than existing models of LCI and supports extraction of shared structure across latent causes. Across three simulations, we found that LCNet could 1) extract shared structure across latent causes in a function-learning task while avoiding catastrophic interference, 2) capture human data on curriculum effects in schema learning, and 3) infer the underlying event structure when processing naturalistic videos of daily activities. Our work provides a biologically plausible computational model that can operate in both laboratory experiment settings and naturalistic settings, opening up the possibility of providing a unified model of event cognition.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > Wisconsin (0.04)
- Asia > Middle East > Jordan (0.04)