AITopics | co-occurrence statistics

Co-occurrence is not Factual Association in Language Models

Neural Information Processing SystemsMar-21-2026, 04:42:22 GMT

Pretrained language models can encode a large amount of knowledge and utilize it for various reasoning tasks, yet they can still struggle to learn novel factual knowledge effectively from finetuning on limited textual demonstrations. In this work, we show that the reason for this deficiency is that language models are biased to learn word co-occurrence statistics instead of true factual associations. We identify the differences between two forms of knowledge representation in language models: knowledge in the form of co-occurrence statistics is encoded in the middle layers of the transformer model and does not generalize well to reasoning scenarios beyond simple question answering, while true factual associations are encoded in the lower layers and can be freely utilized in various reasoning tasks. Based on these observations, we propose two strategies to improve the learning of factual associations in language models. We show that training on text with implicit rather than explicit factual associations can force the model to learn factual associations instead of co-occurrence statistics, significantly improving the generalization of newly learned knowledge. We also propose a simple training method to actively forget the learned co-occurrence statistics, which unblocks and enhances the learning of factual associations when training on plain narrative text. On both synthetic and real-world corpora, the two proposed strategies improve the generalization of the knowledge learned during finetuning to reasoning scenarios such as indirect and multi-hop question answering.

artificial intelligence, factual association, natural language, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

AMatrixChernoffBoundforMarkovChainsandIts ApplicationtoCo-occurrenceMatrices

Neural Information Processing SystemsFeb-10-2026, 14:24:56 GMT

We prove a Chernoff-type bound for sums of matrix-valued random variables sampled viaaregular (aperiodic and irreducible) finite Markovchain.

artificial intelligence, machine learning, markov chain, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.52)

Add feedback

775226eaa2a36c543e2bd6cc9eae1b6a-Paper-Conference.pdf

Neural Information Processing SystemsOct-10-2025, 06:31:31 GMT

factual association, knowledge, language model, (15 more...)

Neural Information Processing Systems

Country:

Europe > France (0.05)
North America > Canada > Ontario > Toronto (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.68)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

d63fbf8c3173730f82b150c5ef38b8ff-Paper.pdf

Neural Information Processing SystemsAug-16-2025, 15:46:49 GMT

co-occurrence matrix, markov chain, matrix, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.58)

Add feedback

Co-occurrence is not Factual Association in Language Models

Neural Information Processing SystemsMay-27-2025, 05:47:27 GMT

Pretrained language models can encode a large amount of knowledge and utilize it for various reasoning tasks, yet they can still struggle to learn novel factual knowledge effectively from finetuning on limited textual demonstrations. In this work, we show that the reason for this deficiency is that language models are biased to learn word co-occurrence statistics instead of true factual associations. We identify the differences between two forms of knowledge representation in language models: knowledge in the form of co-occurrence statistics is encoded in the middle layers of the transformer model and does not generalize well to reasoning scenarios beyond simple question answering, while true factual associations are encoded in the lower layers and can be freely utilized in various reasoning tasks. Based on these observations, we propose two strategies to improve the learning of factual associations in language models. We show that training on text with implicit rather than explicit factual associations can force the model to learn factual associations instead of co-occurrence statistics, significantly improving the generalization of newly learned knowledge.

co-occurrence statistics, factual association, language model, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

024d7f84fff11dd7e8d9c510137a2381-Reviews.html

Neural Information Processing SystemsFeb-11-2025, 18:04:02 GMT

This paper examines third-order co-occurrence statistics of edges in natural images. The main finding is that triples of edges co-occur in ways consistent with smooth flows, elongated along the tangent direction. Better statistical models of the geometry of natural image structure are crucial to understanding biological vision as well as making progress in computer vision, and the authors are correct in their claim that to date most models have focused on 2nd-order relationships, particularly the joint distribution of pairs of edges. Thus the investigation of higher-order statistics is of great interest. Overall the methods seem sound.

co-occurrence statistics, eigenvector, statistics, (15 more...)

Neural Information Processing Systems

Genre: Research Report (0.36)

Technology: Information Technology > Artificial Intelligence (0.90)

Add feedback

Co-occurrence is not Factual Association in Language Models

Zhang, Xiao, Li, Miao, Wu, Ji

arXiv.org Artificial IntelligenceSep-21-2024

Pretrained language models can encode a large amount of knowledge and utilize it for various reasoning tasks, yet they can still struggle to learn novel factual knowledge effectively from finetuning on limited textual demonstrations. In this work, we show that the reason for this deficiency is that language models are biased to learn word co-occurrence statistics instead of true factual associations. We identify the differences between two forms of knowledge representation in language models: knowledge in the form of co-occurrence statistics is encoded in the middle layers of the transformer model and does not generalize well to reasoning scenarios beyond simple question answering, while true factual associations are encoded in the lower layers and can be freely utilized in various reasoning tasks. Based on these observations, we propose two strategies to improve the learning of factual associations in language models. We show that training on text with implicit rather than explicit factual associations can force the model to learn factual associations instead of co-occurrence statistics, significantly improving the generalization of newly learned knowledge. We also propose a simple training method to actively forget the learned co-occurrence statistics, which unblocks and enhances the learning of factual associations when training on plain narrative text. On both synthetic and real-world corpora, the two proposed strategies improve the generalization of the knowledge learned during finetuning to reasoning scenarios such as indirect and multi-hop question answering.

factual association, knowledge, language model, (14 more...)

arXiv.org Artificial Intelligence

2409.14057

Country:

Europe > France (0.05)
North America > Canada > Ontario > Toronto (0.04)
Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)

Genre: Research Report (0.82)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Add feedback

Impact of Co-occurrence on Factual Knowledge of Large Language Models

Kang, Cheongwoong, Choi, Jaesik

arXiv.org Artificial IntelligenceOct-12-2023

Large language models (LLMs) often make factually incorrect responses despite their success in various applications. In this paper, we hypothesize that relying heavily on simple co-occurrence statistics of the pre-training corpora is one of the main factors that cause factual errors. Our results reveal that LLMs are vulnerable to the co-occurrence bias, defined as preferring frequently co-occurred words over the correct answer. Consequently, LLMs struggle to recall facts whose subject and object rarely co-occur in the pre-training dataset although they are seen during finetuning. We show that co-occurrence bias remains despite scaling up model sizes or finetuning. Therefore, we suggest finetuning on a debiased dataset to mitigate the bias by filtering out biased samples whose subject-object co-occurrence count is high. Although debiased finetuning allows LLMs to memorize rare facts in the training set, it is not effective in recalling rare facts unseen during finetuning. Further research in mitigation will help build reliable language models by preventing potential errors. The code is available at \url{https://github.com/CheongWoong/impact_of_cooccurrence}.

factual knowledge, knowledge, language model, (14 more...)

arXiv.org Artificial Intelligence

2310.08256

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > Canada > Ontario > National Capital Region > Ottawa (0.04)
North America > Canada > Ontario > Middlesex County > London (0.04)
(5 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

Add feedback

Learn NLP the Stanford Way -- Lesson 2

#artificialintelligenceDec-9-2020, 03:15:15 GMT

In the previous post, we introduced NLP. To find out word meanings with the Python programming language, we used the NLTK package and worked our way into word embeddings using the gensim package and Word2vec. Since we only touched the Word2Vec technique from a 10,000-feet overview, we are now going to dive deeper into the training method to create a Word2vec model. The Word2vec (Mikolov et al. 2013)[1][2] is not a singular technique or algorithm. It's actually a family of neural network architectures and optimization techniques that can produce good results learning embeddings for large datasets.

cbow model, glove model, neural network, (16 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

A Matrix Chernoff Bound for Markov Chains and Its Application to Co-occurrence Matrices

Qiu, Jiezhong, Wang, Chi, Liao, Ben, Peng, Richard, Tang, Jie

arXiv.org Machine LearningOct-29-2020

We prove a Chernoff-type bound for sums of matrix-valued random variables sampled via a regular (aperiodic and irreducible) finite Markov chain. Specially, consider a random walk on a regular Markov chain and a Hermitian matrix-valued function on its state space. Our result gives exponentially decreasing bounds on the tail distributions of the extreme eigenvalues of the sample mean matrix. Our proof is based on the matrix expander (regular undirected graph) Chernoff bound [Garg et al. STOC '18] and scalar Chernoff-Hoeffding bounds for Markov chains [Chung et al. STACS '12]. Our matrix Chernoff bound for Markov chains can be applied to analyze the behavior of co-occurrence statistics for sequential data, which have been common and important data signals in machine learning. We show that given a regular Markov chain with $n$ states and mixing time $\tau$, we need a trajectory of length $O(\tau (\log{(n)}+\log{(\tau)})/\epsilon^2)$ to achieve an estimator of the co-occurrence matrix with error bound $\epsilon$. We conduct several experiments and the experimental results are consistent with the exponentially fast convergence rate from theoretical analysis. Our result gives the first bound on the convergence rate of the co-occurrence matrix and the first sample complexity analysis in graph representation learning.

artificial intelligence, machine learning, markov chain, (16 more...)

arXiv.org Machine Learning

2008.02464

Country: