knowledge retention
- Asia > China (0.04)
- Europe > United Kingdom (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Saturation Self-Organizing Map
Urbanik, Igor, Gajewski, Paweł
Intelligent agents navigating real-world environments must continuously learn, adapting to new information while retaining prior knowledge [14]. This ability, known as continual or lifelong learning, poses a significant challenge in modern machine learning. Most artificial neural systems struggle with catastrophic forgetting [5], where training on new tasks or data distributions abruptly erases previously learned information. This phenomenon stems from the shared nature of representations in standard neural networks, where updating weights for new data can overwrite information critical for past tasks. Overcoming catastrophic forgetting is crucial for developing robust, adaptable systems that can learn incrementally from data streams, rather than being retrained from scratch. Numerous approaches have been proposed to mitigate catastrophic forgetting, ranging from regularization techniques and memory replay to architectural modifications. However, many state-of-the-art solutions, despite showing promising results, require substantial changes to model structure or training procedures. This often limits their compatibility with widely used and well-understood machine learning frameworks.
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints
Jiang, Kailin, Jiang, Hongbo, Jiang, Ning, Gao, Zhi, Bi, Jinhe, Ren, Yuchen, Li, Bin, Du, Yuntao, Liu, Lei, Li, Qing
Large Multimodal Models encode extensive factual knowledge in their pre-trained weights. However, its knowledge remains static and limited, unable to keep pace with real-world developments, which hinders continuous knowledge acquisition. Effective knowledge injection thus becomes critical, involving two goals: knowledge adaptation (injecting new knowledge) and knowledge retention (preserving old knowledge). Existing methods often struggle to learn new knowledge and suffer from catastrophic forgetting. To address this, we propose KORE, a synergistic method of KnOwledge-oRientEd augmentations and constraints for injecting new knowledge into large multimodal models while preserving old knowledge. Unlike general text or image data augmentation, KORE automatically converts individual knowledge items into structured and comprehensive knowledge to ensure that the model accurately learns new knowledge, enabling accurate adaptation. Meanwhile, KORE stores previous knowledge in the covariance matrix of LMM's linear layer activations and initializes the adapter by projecting the original weights into the matrix's null space, defining a fine-tuning direction that minimizes interference with previous knowledge, enabling powerful retention. Extensive experiments on various LMMs, including LLaVA-v1.5-7B, LLaVA-v1.5-13B, and Qwen2.5-VL-7B, show that KORE achieves superior new knowledge injection performance and effectively mitigates catastrophic forgetting.
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
- Asia > China > Fujian Province > Xiamen (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.66)
- Asia > China (0.04)
- Europe > United Kingdom (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models
Lv, Kangtao, Chen, Haibin, Yuan, Yujin, Liu, Langming, Liu, Shilei, Wang, Yongwei, Su, Wenbo, Zheng, Bo
Large language models (LLMs) have attracted significant attention due to their impressive general capabilities across diverse downstream tasks. However, without domain-specific optimization, they often underperform on specialized knowledge benchmarks and even produce hallucination. Recent studies show that strategically infusing domain knowledge during pretraining can substantially improve downstream performance. A critical challenge lies in balancing this infusion trade-off: injecting too little domain-specific data yields insufficient specialization, whereas excessive infusion triggers catastrophic forgetting of previously acquired knowledge. In this work, we focus on the phenomenon of memory collapse induced by over-infusion. Through systematic experiments, we make two key observations, i.e. 1) Critical collapse point: each model exhibits a threshold beyond which its knowledge retention capabilities sharply degrade. 2) Scale correlation: these collapse points scale consistently with the model's size. Building on these insights, we propose a knowledge infusion scaling law that predicts the optimal amount of domain knowledge to inject into large LLMs by analyzing their smaller counterparts. Extensive experiments across different model sizes and pertaining token budgets validate both the effectiveness and generalizability of our scaling law.
- North America > United States (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > Dominican Republic (0.04)
- (5 more...)
CCD: Continual Consistency Diffusion for Lifelong Generative Modeling
Liu, Jingren, Xu, Shuning, Wang, Yun, Ji, Zhong, Chen, Xiangyu
While diffusion-based models have shown remarkable generative capabilities in static settings, their extension to continual learning (CL) scenarios remains fundamentally constrained by Generative Catastrophic Forgetting (GCF). We observe that even with a rehearsal buffer, new generative skills often overwrite previous ones, degrading performance on earlier tasks. Although some initial efforts have explored this space, most rely on heuristics borrowed from continual classification methods or use trained diffusion models as ad hoc replay generators, lacking a principled, unified solution to mitigating GCF and often conducting experiments under fragmented and inconsistent settings. To address this gap, we introduce the Continual Diffusion Generation (CDG), a structured pipeline that redefines how diffusion models are implemented under CL and enables systematic evaluation of GCF. Beyond the empirical pipeline, we propose the first theoretical foundation for CDG, grounded in a cross-task analysis of diffusion-specific generative dynamics. Our theoretical investigation identifies three fundamental consistency principles essential for preserving knowledge in the rehearsal buffer over time: inter-task knowledge consistency, unconditional knowledge consistency, and prior knowledge consistency. These criteria expose the latent mechanisms through which generative forgetting manifests across sequential tasks. Motivated by these insights, we further propose \textit{Continual Consistency Diffusion} (CCD), a principled training framework that enforces these consistency objectives via hierarchical loss functions: $\mathcal{L}_{IKC}$, $\mathcal{L}_{UKC}$, and $\mathcal{L}_{PKC}$. Extensive experiments show that CCD achieves SOTA performance across various benchmarks, especially improving generative metrics in overlapping-task scenarios.
- Europe > United Kingdom > UK North Sea (0.25)
- Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.25)
- North America > Canada > Ontario > Toronto (0.14)
- (4 more...)
StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns
Long-term memory (LTM) is essential for large language models (LLMs) to achieve autonomous intelligence in complex, evolving environments. Despite increasing efforts in memory-augmented and retrieval-based architectures, there remains a lack of standardized benchmarks to systematically evaluate LLMs' long-term memory abilities. Existing benchmarks still face challenges in evaluating knowledge retention and dynamic sequential reasoning, and in their own flexibility, all of which limit their effectiveness in assessing models' LTM capabilities. To address these gaps, we propose a novel benchmark framework based on interactive fiction games, featuring dynamically branching storylines with complex reasoning structures. These structures simulate real-world scenarios by requiring LLMs to navigate hierarchical decision trees, where each choice triggers cascading dependencies across multi-turn interactions. Our benchmark emphasizes two distinct settings to test reasoning complexity: one with immediate feedback upon incorrect decisions, and the other requiring models to independently trace back and revise earlier choices after failure. As part of this benchmark, we also construct a new dataset designed to test LLMs' LTM within narrative-driven environments. We further validate the effectiveness of our approach through detailed experiments. Experimental results demonstrate the benchmark's ability to robustly and reliably assess LTM in LLMs.
Data Doping or True Intelligence? Evaluating the Transferability of Injected Knowledge in LLMs
Jan, Essa, Ali, Moiz, Hassan, Muhammad Saram, Zaffar, Fareed, Zaki, Yasir
As the knowledge of large language models (LLMs) becomes outdated over time, there is a growing need for efficient methods to update them, especially when injecting proprietary information. Our study reveals that comprehension-intensive fine-tuning tasks (e.g., question answering and blanks) achieve substantially higher knowledge retention rates (48%) compared to mapping-oriented tasks like translation (17%) or text-to-JSON conversion (20%), despite exposure to identical factual content. We demonstrate that this pattern persists across model architectures and follows scaling laws, with larger models showing improved retention across all task types. However, all models exhibit significant performance drops when applying injected knowledge in broader contexts, suggesting limited semantic integration. These findings show the importance of task selection in updating LLM knowledge, showing that effective knowledge injection relies not just on data exposure but on the depth of cognitive engagement during fine-tuning.
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- Asia > Middle East > Syria (0.14)
- North America > United States > Pennsylvania (0.05)
- (15 more...)
- Government > Regional Government > North America Government > United States Government (1.00)
- Leisure & Entertainment (0.68)
- Government > Voting & Elections (0.68)
Harmony: A Unified Framework for Modality Incremental Learning
Song, Yaguang, Yang, Xiaoshan, Jiang, Dongmei, Wang, Yaowei, Xu, Changsheng
Incremental learning aims to enable models to continuously acquire knowledge from evolving data streams while preserving previously learned capabilities. While current research predominantly focuses on unimodal incremental learning and multimodal incremental learning where the modalities are consistent, real-world scenarios often present data from entirely new modalities, posing additional challenges. This paper investigates the feasibility of developing a unified model capable of incremental learning across continuously evolving modal sequences. To this end, we introduce a novel paradigm called Modality Incremental Learning (MIL), where each learning stage involves data from distinct modalities. To address this task, we propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention, enabling the model to reduce the modal discrepancy and learn from a sequence of distinct modalities, ultimately completing tasks across multiple modalities within a unified framework. Our approach introduces the adaptive compatible feature modulation and cumulative modal bridging. Through constructing historical modal features and performing modal knowledge accumulation and alignment, the proposed components collaboratively bridge modal differences and maintain knowledge retention, even with solely unimodal data available at each learning phase.These components work in concert to establish effective modality connections and maintain knowledge retention, even when only unimodal data is available at each learning stage. Extensive experiments on the MIL task demonstrate that our proposed method significantly outperforms existing incremental learning methods, validating its effectiveness in MIL scenarios.
No Forgetting Learning: Memory-free Continual Learning
Vahedifar, Mohammad Ali, Zhang, Qi
Continual Learning (CL) remains a central challenge in deep learning, where models must sequentially acquire new knowledge while mitigating Catastrophic Forgetting (CF) of prior tasks. Existing approaches often struggle with efficiency and scalability, requiring extensive memory or model buffers. This work introduces ``No Forgetting Learning" (NFL), a memory-free CL framework that leverages knowledge distillation to maintain stability while preserving plasticity. Memory-free means the NFL does not rely on any memory buffer. Through extensive evaluations of three benchmark datasets, we demonstrate that NFL achieves competitive performance while utilizing approximately 14.75 times less memory than state-of-the-art methods. Furthermore, we introduce a new metric to better assess CL's plasticity-stability trade-off.