Goto

Collaborating Authors

 factual knowledge



How Do Large Language Models Acquire Factual Knowledge During Pretraining?

Neural Information Processing Systems

Despite the recent observation that large language models (LLMs) can store substantial factual knowledge, there is a limited understanding of the mechanisms of how they acquire factual knowledge through pretraining. This work addresses this gap by studying how LLMs acquire factual knowledge during pretraining. The findings reveal several important insights into the dynamics of factual knowledge acquisition during pretraining. First, counterintuitively, we observe that pretraining on more data shows no significant improvement in the model's capability to acquire and maintain factual knowledge. Next, LLMs undergo forgetting of memorization and generalization of factual knowledge, and LLMs trained with duplicated training data exhibit faster forgetting. Third, training LLMs with larger batch sizes can enhance the models' robustness to forgetting. Overall, our observations suggest that factual knowledge acquisition in LLM pretraining occurs by progressively increasing the probability of factual knowledge presented in the pretraining data at each step. However, this increase is diluted by subsequent forgetting. Based on this interpretation, we demonstrate that we can provide plausible explanations on recently observed behaviors of LLMs, such as the poor performance of LLMs on long-tail knowledge and the benefits of deduplicating the pretraining corpus.


From Confidence to Collapse in LLM Factual Robustness

Fastowski, Alina, Prenkaj, Bardh, Kasneci, Gjergji

arXiv.org Artificial Intelligence

Ensuring the robustness of factual knowledge in LLMs is critical for reliable applications in tasks such as question answering and reasoning. However, existing evaluation methods predominantly focus on performance-based metrics, often investigating from the perspective of prompt perturbations, which captures only the externally triggered side of knowledge robustness. To bridge this gap, we introduce a principled approach to measure factual robustness from the perspective of the generation process by analyzing token distribution entropy in combination with temperature scaling sensitivity. These two factors build the Factual Robustness Score (FRS), a novel metric which quantifies the stability of a fact against perturbations in decoding conditions, given its initial uncertainty. To validate our approach, we conduct extensive experiments on 5 LLMs across 3 closed-book QA datasets (SQuAD, TriviaQA, and HotpotQA). We show that factual robustness varies significantly -- smaller models report an FRS of $0.76$, larger ones $0.93$ -- with accuracy degrading by ~$60\%$ under increased uncertainty. These insights demonstrate how entropy and temperature scaling impact factual accuracy, and lay a foundation for developing more robust knowledge retention and retrieval in future models.


When Language Shapes Thought: Cross-Lingual Transfer of Factual Knowledge in Question Answering

Kang, Eojin, Kim, Juae

arXiv.org Artificial Intelligence

Multilingual large language models (LLMs) offer promising opportunities for cross-lingual information access, yet their use of factual knowledge remains highly sensitive to the input language. Prior work has addressed this through English prompting and evaluation, assuming that English-based reasoning is universally beneficial. In this work, we challenge that assumption by exploring factual knowledge transfer from non-English to English through the lens of Language and Thought Theory. We introduce Language-to-Thought (L2T) prompting, which aligns the model's internal ''thinking'' language with the source of knowledge. Across three languages and four models, L2T consistently outperforms English-based reasoning, reversing the expected advantage of English prompts. Our code is available at https://github.com/GeomeunByeol/Language2Thought.


ExplicitLM: Decoupling Knowledge from Parameters via Explicit Memory Banks

Yu, Chengzhang, Lu, Zening, Zheng, Chenyang, Wang, Chiyue, Zhang, Yiming, Jin, Zhanpeng

arXiv.org Artificial Intelligence

Large language models (LLMs) universally suffer from knowledge staleness and lack of interpretability due to their implicit knowledge storage paradigm, where information is distributed across network parameters in an entangled, non-addressable manner. This fundamental limitation prevents targeted knowledge updates, verification of stored information, and understanding of model reasoning processes. We propose ExplicitLM, a novel architecture that fundamentally reimagines knowledge storage in language models through an explicit, interpretable memory bank system. Our key innovation introduces a million-scale external memory bank where each entry stores human-readable knowledge as token sequences, enabling direct inspection and modification of the model's knowledge base. To efficiently access this massive repository, we design a differentiable two-stage retrieval mechanism that enables end-to-end training while maintaining discrete knowledge selection, combining efficient coarse-grained filtering with product key decomposition (reducing computational complexity from O(N |I|) to O( N |I|)) and fine-grained similarity matching through Gumbel-Softmax. Drawing inspiration from dual-system cognitive theory, we partition knowledge into frozen explicit facts (20%) and learnable implicit patterns (80%), maintained through an Exponential Moving Average update strategy that ensures training stability.


Balancing Knowledge Updates: Toward Unified Modular Editing in LLMs

Liu, Jiahao, Wang, Zijian, Zhao, Kuo, Hu, Dong

arXiv.org Artificial Intelligence

Knowledge editing has emerged as an efficient approach for updating factual knowledge in large language models (LLMs), typically achieved by first locating key knowledge-storage modules and then modifying their parameters. However, most existing methods focus exclusively on updating the weights of Multi-Layer Perceptron (MLP) modules, which are commonly identified as the primary repositories of factual information. Other important components, such as attention (Attn) modules--one of the core modules in LLMs--are often ignored during editing. This biased allocation of updates can leave residual outdated knowledge in the model and limit the effectiveness of knowledge editing. In this paper, we conduct comprehensive and systematic knowledge localization experiments on advanced LLMs, revealing that Attn modules play a substantial role in factual knowledge storage and retrieval, especially in earlier layers. Building on these insights, we propose IntAttn-Edit, a novel method that extends the associative memory paradigm to jointly update both MLP and Attn modules. Our approach employs a knowledge balancing strategy that proportionally allocates update magnitudes based on each module's measured contribution to knowledge storage. Extensive experiments on popular benchmarks demonstrate that IntAttn-Edit consistently achieves superior results over existing methods, delivering higher edit success, improved generalization, and robust knowledge preservation. Further empirical analysis shows that our knowledge balancing strategy enables the editing performance to remain within the optimal range across different settings.


A Diagnostic Benchmark for Sweden-Related Factual Knowledge

Kunz, Jenny

arXiv.org Artificial Intelligence

Many Swedish benchmarks are translated US-centric benchmarks, and therefore not suitable for testing knowledge that is particularly relevant, or even specific, to Sweden. We therefore introduce a manually written question-answering benchmark specifically targeted to Sweden-related personalities and events, many of which receive very limited coverage in international media. Our annotators drew inspiration from a popular radio program featuring public figures from culture and media, as well as major sports events in Sweden. The dataset can be used to measure factual recall across models of varying sizes and degrees of Swedish coverage, and allows to probe cross-lingual factual consistency as to contains English translations. Using the dataset, we find that smaller models with stronger Swedish coverage perform comparably to a three times larger multilingual model in recalling Sweden-related facts. We also observe that continued pre-training on Swedish generally improves factual knowledge but also leads to forgetting of a part of the previously known information. These results demonstrate the dataset's potential as a diagnostic tool for studying language adaptation and knowledge retention in multilingual models and during language adaptation.




Tracing Multilingual Factual Knowledge Acquisition in Pretraining

Liu, Yihong, Wang, Mingyang, Kargaran, Amir Hossein, Körner, Felicia, Nie, Ercong, Plank, Barbara, Yvon, François, Schütze, Hinrich

arXiv.org Artificial Intelligence

Large Language Models (LLMs) are capable of recalling multilingual factual knowledge present in their pretraining data. However, most studies evaluate only the final model, leaving the development of factual recall and crosslingual consistency throughout pretraining largely unexplored. In this work, we trace how factual recall and crosslingual consistency evolve during pretraining, focusing on OLMo-7B as a case study. We find that both accuracy and consistency improve over time for most languages. We show that this improvement is primarily driven by the fact frequency in the pretraining corpus: more frequent facts are more likely to be recalled correctly, regardless of language. Yet, some low-frequency facts in non-English languages can still be correctly recalled. Our analysis reveals that these instances largely benefit from crosslingual transfer of their English counterparts -- an effect that emerges predominantly in the early stages of pretraining. We pinpoint two distinct pathways through which multilingual factual knowledge acquisition occurs: (1) frequency-driven learning, which is dominant and language-agnostic, and (2) crosslingual transfer, which is limited in scale and typically constrained to relation types involving named entities. We release our code and data to facilitate further research at https://github.com/cisnlp/multilingual-fact-tracing.