AITopics | knowledge retention

Collaborating Authors

knowledge retention

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Diffusion Tuning: Transferring Diffusion Models via Chain of Forgetting

Neural Information Processing SystemsFeb-18-2026, 05:41:40 GMT

Diffusion models have significantly advanced the field of generative modeling.

diffusion model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Europe > United Kingdom (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Saturation Self-Organizing Map

Urbanik, Igor, Gajewski, Paweł

arXiv.org Artificial IntelligenceNov-18-2025

Intelligent agents navigating real-world environments must continuously learn, adapting to new information while retaining prior knowledge [14]. This ability, known as continual or lifelong learning, poses a significant challenge in modern machine learning. Most artificial neural systems struggle with catastrophic forgetting [5], where training on new tasks or data distributions abruptly erases previously learned information. This phenomenon stems from the shared nature of representations in standard neural networks, where updating weights for new data can overwrite information critical for past tasks. Overcoming catastrophic forgetting is crucial for developing robust, adaptable systems that can learn incrementally from data streams, rather than being retrained from scratch. Numerous approaches have been proposed to mitigate catastrophic forgetting, ranging from regularization techniques and memory replay to architectural modifications. However, many state-of-the-art solutions, despite showing promising results, require substantial changes to model structure or training procedures. This often limits their compatibility with widely used and well-understood machine learning frameworks.

artificial intelligence, machine learning, satsom, (15 more...)

arXiv.org Artificial Intelligence

2506.1068

Genre:

Research Report > New Finding (0.46)
Research Report > Promising Solution (0.34)

Industry: Education > Educational Setting > Continuing Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

KORE: Enhancing Knowledge Injection for Large Multimodal Models via Knowledge-Oriented Augmentations and Constraints

Jiang, Kailin, Jiang, Hongbo, Jiang, Ning, Gao, Zhi, Bi, Jinhe, Ren, Yuchen, Li, Bin, Du, Yuntao, Liu, Lei, Li, Qing

arXiv.org Artificial IntelligenceOct-23-2025

Large Multimodal Models encode extensive factual knowledge in their pre-trained weights. However, its knowledge remains static and limited, unable to keep pace with real-world developments, which hinders continuous knowledge acquisition. Effective knowledge injection thus becomes critical, involving two goals: knowledge adaptation (injecting new knowledge) and knowledge retention (preserving old knowledge). Existing methods often struggle to learn new knowledge and suffer from catastrophic forgetting. To address this, we propose KORE, a synergistic method of KnOwledge-oRientEd augmentations and constraints for injecting new knowledge into large multimodal models while preserving old knowledge. Unlike general text or image data augmentation, KORE automatically converts individual knowledge items into structured and comprehensive knowledge to ensure that the model accurately learns new knowledge, enabling accurate adaptation. Meanwhile, KORE stores previous knowledge in the covariance matrix of LMM's linear layer activations and initializes the adapter by projecting the original weights into the matrix's null space, defining a fine-tuning direction that minimizes interference with previous knowledge, enabling powerful retention. Extensive experiments on various LMMs, including LLaVA-v1.5-7B, LLaVA-v1.5-13B, and Qwen2.5-VL-7B, show that KORE achieves superior new knowledge injection performance and effectively mitigates catastrophic forgetting.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.19316

Country:

Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Asia > China > Fujian Province > Xiamen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.66)

Add feedback

Diffusion Tuning: Transferring Diffusion Models via Chain of Forgetting

Neural Information Processing SystemsOct-10-2025, 17:09:37 GMT

Diffusion models have significantly advanced the field of generative modeling.

dataset, diff-tuning, diffusion model, (14 more...)

Neural Information Processing Systems

Country:

Asia > China (0.04)
Europe > United Kingdom (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

How to inject knowledge efficiently? Knowledge Infusion Scaling Law for Pre-training Large Language Models

Lv, Kangtao, Chen, Haibin, Yuan, Yujin, Liu, Langming, Liu, Shilei, Wang, Yongwei, Su, Wenbo, Zheng, Bo

arXiv.org Artificial IntelligenceSep-25-2025

Large language models (LLMs) have attracted significant attention due to their impressive general capabilities across diverse downstream tasks. However, without domain-specific optimization, they often underperform on specialized knowledge benchmarks and even produce hallucination. Recent studies show that strategically infusing domain knowledge during pretraining can substantially improve downstream performance. A critical challenge lies in balancing this infusion trade-off: injecting too little domain-specific data yields insufficient specialization, whereas excessive infusion triggers catastrophic forgetting of previously acquired knowledge. In this work, we focus on the phenomenon of memory collapse induced by over-infusion. Through systematic experiments, we make two key observations, i.e. 1) Critical collapse point: each model exhibits a threshold beyond which its knowledge retention capabilities sharply degrade. 2) Scale correlation: these collapse points scale consistently with the model's size. Building on these insights, we propose a knowledge infusion scaling law that predicts the optimal amount of domain knowledge to inject into large LLMs by analyzing their smaller counterparts. Extensive experiments across different model sizes and pertaining token budgets validate both the effectiveness and generalizability of our scaling law.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.19371

Country:

North America > United States (0.04)
Asia > China > Shanghai > Shanghai (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

CCD: Continual Consistency Diffusion for Lifelong Generative Modeling

Liu, Jingren, Xu, Shuning, Wang, Yun, Ji, Zhong, Chen, Xiangyu

arXiv.org Artificial IntelligenceAug-25-2025

While diffusion-based models have shown remarkable generative capabilities in static settings, their extension to continual learning (CL) scenarios remains fundamentally constrained by Generative Catastrophic Forgetting (GCF). We observe that even with a rehearsal buffer, new generative skills often overwrite previous ones, degrading performance on earlier tasks. Although some initial efforts have explored this space, most rely on heuristics borrowed from continual classification methods or use trained diffusion models as ad hoc replay generators, lacking a principled, unified solution to mitigating GCF and often conducting experiments under fragmented and inconsistent settings. To address this gap, we introduce the Continual Diffusion Generation (CDG), a structured pipeline that redefines how diffusion models are implemented under CL and enables systematic evaluation of GCF. Beyond the empirical pipeline, we propose the first theoretical foundation for CDG, grounded in a cross-task analysis of diffusion-specific generative dynamics. Our theoretical investigation identifies three fundamental consistency principles essential for preserving knowledge in the rehearsal buffer over time: inter-task knowledge consistency, unconditional knowledge consistency, and prior knowledge consistency. These criteria expose the latent mechanisms through which generative forgetting manifests across sequential tasks. Motivated by these insights, we further propose \textit{Continual Consistency Diffusion} (CCD), a principled training framework that enforces these consistency objectives via hierarchical loss functions: $\mathcal{L}_{IKC}$, $\mathcal{L}_{UKC}$, and $\mathcal{L}_{PKC}$. Extensive experiments show that CCD achieves SOTA performance across various benchmarks, especially improving generative metrics in overlapping-task scenarios.

artificial intelligence, knowledge, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2505.11936

Country:

North America > Canada > Ontario (0.28)
Asia > China (0.28)
Europe > United Kingdom > UK North Sea (0.25)
Atlantic Ocean > North Atlantic Ocean > North Sea > UK North Sea (0.25)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

StoryBench: A Dynamic Benchmark for Evaluating Long-Term Memory with Multi Turns

Wan, Luanbo, Ma, Weizhi

arXiv.org Artificial IntelligenceJun-17-2025

Long-term memory (LTM) is essential for large language models (LLMs) to achieve autonomous intelligence in complex, evolving environments. Despite increasing efforts in memory-augmented and retrieval-based architectures, there remains a lack of standardized benchmarks to systematically evaluate LLMs' long-term memory abilities. Existing benchmarks still face challenges in evaluating knowledge retention and dynamic sequential reasoning, and in their own flexibility, all of which limit their effectiveness in assessing models' LTM capabilities. To address these gaps, we propose a novel benchmark framework based on interactive fiction games, featuring dynamically branching storylines with complex reasoning structures. These structures simulate real-world scenarios by requiring LLMs to navigate hierarchical decision trees, where each choice triggers cascading dependencies across multi-turn interactions. Our benchmark emphasizes two distinct settings to test reasoning complexity: one with immediate feedback upon incorrect decisions, and the other requiring models to independently trace back and revise earlier choices after failure. As part of this benchmark, we also construct a new dataset designed to test LLMs' LTM within narrative-driven environments. We further validate the effectiveness of our approach through detailed experiments. Experimental results demonstrate the benchmark's ability to robustly and reliably assess LTM in LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.13356

Country: Asia > China (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Data Doping or True Intelligence? Evaluating the Transferability of Injected Knowledge in LLMs

Jan, Essa, Ali, Moiz, Hassan, Muhammad Saram, Zaffar, Fareed, Zaki, Yasir

arXiv.org Artificial IntelligenceMay-26-2025

As the knowledge of large language models (LLMs) becomes outdated over time, there is a growing need for efficient methods to update them, especially when injecting proprietary information. Our study reveals that comprehension-intensive fine-tuning tasks (e.g., question answering and blanks) achieve substantially higher knowledge retention rates (48%) compared to mapping-oriented tasks like translation (17%) or text-to-JSON conversion (20%), despite exposure to identical factual content. We demonstrate that this pattern persists across model architectures and follows scaling laws, with larger models showing improved retention across all task types. However, all models exhibit significant performance drops when applying injected knowledge in broader contexts, suggesting limited semantic integration. These findings show the importance of task selection in updating LLM knowledge, showing that effective knowledge injection relies not just on data exposure but on the depth of cognitive engagement during fine-tuning.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.1714

Country:

North America > United States > California (0.28)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Leisure & Entertainment (0.68)
Government > Voting & Elections (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Harmony: A Unified Framework for Modality Incremental Learning

Song, Yaguang, Yang, Xiaoshan, Jiang, Dongmei, Wang, Yaowei, Xu, Changsheng

arXiv.org Artificial IntelligenceApr-21-2025

Incremental learning aims to enable models to continuously acquire knowledge from evolving data streams while preserving previously learned capabilities. While current research predominantly focuses on unimodal incremental learning and multimodal incremental learning where the modalities are consistent, real-world scenarios often present data from entirely new modalities, posing additional challenges. This paper investigates the feasibility of developing a unified model capable of incremental learning across continuously evolving modal sequences. To this end, we introduce a novel paradigm called Modality Incremental Learning (MIL), where each learning stage involves data from distinct modalities. To address this task, we propose a novel framework named Harmony, designed to achieve modal alignment and knowledge retention, enabling the model to reduce the modal discrepancy and learn from a sequence of distinct modalities, ultimately completing tasks across multiple modalities within a unified framework. Our approach introduces the adaptive compatible feature modulation and cumulative modal bridging. Through constructing historical modal features and performing modal knowledge accumulation and alignment, the proposed components collaboratively bridge modal differences and maintain knowledge retention, even with solely unimodal data available at each learning phase.These components work in concert to establish effective modality connections and maintain knowledge retention, even when only unimodal data is available at each learning stage. Extensive experiments on the MIL task demonstrate that our proposed method significantly outperforms existing incremental learning methods, validating its effectiveness in MIL scenarios.

artificial intelligence, machine learning, modality, (15 more...)

arXiv.org Artificial Intelligence

2504.13218

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

No Forgetting Learning: Memory-free Continual Learning

Vahedifar, Mohammad Ali, Zhang, Qi

arXiv.org Artificial IntelligenceMar-7-2025

Continual Learning (CL) remains a central challenge in deep learning, where models must sequentially acquire new knowledge while mitigating Catastrophic Forgetting (CF) of prior tasks. Existing approaches often struggle with efficiency and scalability, requiring extensive memory or model buffers. This work introduces ``No Forgetting Learning" (NFL), a memory-free CL framework that leverages knowledge distillation to maintain stability while preserving plasticity. Memory-free means the NFL does not rely on any memory buffer. Through extensive evaluations of three benchmark datasets, we demonstrate that NFL achieves competitive performance while utilizing approximately 14.75 times less memory than state-of-the-art methods. Furthermore, we introduce a new metric to better assess CL's plasticity-stability trade-off.

incremental class, knowledge, learning, (15 more...)

arXiv.org Artificial Intelligence

2503.04638

Country: North America > Canada > Ontario > Toronto (0.14)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment > Sports > Football (0.49)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback