What does the Knowledge Neuron Thesis Have to do with Knowledge?

Niu, Jingcheng, Liu, Andrew, Zhu, Zining, Penn, Gerald

arXiv.org Artificial Intelligence 

We reassess the Knowledge Neuron (KN) Thesis: an interpretation of the mechanism underlying the ability of large language models to recall facts from a training corpus. This nascent thesis proposes that facts are recalled from the training corpus through the MLP weights in a manner resembling key-value memory, implying in effect that "knowledge" is stored in the network. Furthermore, by modifying the MLP modules, one can control the language model's generation of factual information. The plausibility of the KN thesis has been demonstrated by the success of KN-inspired model editing methods (Dai et al., 2022; Meng et al., 2022). We find that this thesis is, at best, an oversimplification. Not only have we found that we can edit the expression of certain linguistic phenomena using the same model editing methods but, through a more comprehensive evaluation, we have found that the KN thesis does not adequately explain the process of factual expression. While it is possible to argue that the MLP weights store complex patterns that are interpretable both syntactically and semantically, these patterns do not constitute "knowledge." To gain a more comprehensive understanding of the knowledge representation process, we must look beyond the MLP weights and explore recent models' complex layer structures and attention mechanisms. Recent research has highlighted the remarkable ability of large pretrained language models (PLMs) to recall facts from a training corpus (Petroni et al., 2019). The underlying mechanism by which this information is stored and retrieved within PLMs, however, remains a subject of intensive investigation. The Knowledge Neuron (KN) Thesis has been recently proposed as a novel framework for interpreting language models (LMs) (Dai et al., 2022; Meng et al., 2022; 2023). This thesis suggests that LMs operate akin to key-value memories, recalling facts from the training corpus through the multi-layer perceptron (MLP) weights. Therefore, a significant implication of the KN thesis is that factual information generation by LMs can be controlled by modifying the MLP modules. Should this manipulation of factual information recall become feasible, it could lead to the development of language models that are more controllable, interpretable, and factually aligned. The plausibility of the KN thesis is demonstrated by the success of KN-inspired model-editing methods. Dai et al. (2022) argued that relational facts can be localised to a handful of 2-5 MLP neurons.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found