Goto

Collaborating Authors

 reversal curse




The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Neural Information Processing Systems

Today's best language models still struggle with hallucinations, factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The, where models cannot recall information when probed in a different order than was encountered during training, exemplifies limitations in information retrieval. To better understand these limitations, we reframe the reversal curse as a --- a failure of models to learn the same joint distribution under different factorizations.We more closely simulate finetuning workflows which train pretrained models on specialized knowledge by introducing, a realistic testbed based on Wikipedia knowledge graphs. Through a series of controlled experiments with increasing levels of realism, including non-reciprocal relations, we find that reliable information retrieval is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities.


Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

Neural Information Processing Systems

Auto-regressive large language models (LLMs) show impressive capacities to solve many complex reasoning tasks while struggling with some simple logical reasoning tasks such as inverse search: when trained on ''$A \to B$'' (e.g., *Tom is the parent of John*), LLM fails to directly conclude ''$B \gets A$'' (e.g., *John is the child of Tom*) during inference even if the two sentences are semantically identical, which is known as the ''reversal curse''. In this paper, we theoretically analyze the reversal curse via the training dynamics of (stochastic) gradient descent for two auto-regressive models: (1) a bilinear model that can be viewed as a simplification of a one-layer transformer; (2) one-layer transformers under certain assumptions. Our analysis reveals that for both models, the reversal curse is a consequence of the (effective) model weights *asymmetry*, i.e., the increase of weights from a token $A$ to token $B$ during training does not necessarily cause the increase of the weights from $B$ to $A$, which is caused by the training dynamics under certain choice of loss function and the optimization space of model parameters. Moreover, our analysis can be naturally applied to other logical reasoning tasks such as chain-of-thought (COT), which provides a new perspective different from previous work that focuses on expressivity. Finally, we conduct experiments to validate our theory on multi-layer transformers under different settings.


Delving into the Reversal Curse: How Far Can Large Language Models Generalize?

Neural Information Processing Systems

A prime example is the recently debated reversal curse, which surfaces when models, having been trained on the fact A is B, struggle to generalize this knowledge to infer that B is A.In this paper, we examine the manifestation of the reversal curse across various tasks and delve into both the generalization abilities and the problem-solving mechanisms of LLMs. This investigation leads to a series of significant insights:(1) LLMs are able to generalize to B is A when both A and B are presented in the context as in the case of a multiple-choice question.(2) This generalization ability is highly correlated to the structure of the fact A is B in the training documents. For example, this generalization only applies to biographies structured in [Name] is [Description] but not to [Description] is [Name].(3) We propose and verify the hypothesis that LLMs possess an inherent bias in fact recalling during knowledge application, which explains and underscores the importance of the document structure to successful learning.(4) The negative impact of this bias on the downstream performance of LLMs can hardly be mitigated through training alone.Based on these intriguing findings, our work not only presents a novel perspective for interpreting LLMs' generalization abilities from their intrinsic working mechanism but also provides new insights for the development of more effective learning methods for LLMs.



Bilinear relational structure fixes reversal curse and enables consistent model editing

Kim, Dong-Kyum, Kim, Minsung, Kwon, Jea, Yang, Nakyeong, Cha, Meeyoung

arXiv.org Artificial Intelligence

The reversal curse--a language model's (LM) inability to infer an unseen fact "B is A " from a learned fact "A is B"--is widely considered a fundamental limitation. We show that this is not an inherent failure but an artifact of how models encode knowledge. By training LMs from scratch on a synthetic dataset of relational knowledge graphs, we demonstrate that bilinear relational structure emerges in their hidden representations. Crucially, we also find that this bilinear structure plays a key role in consistent model editing. When a fact is updated in a LM with this structure, the edit correctly propagates to its reverse and other logically dependent facts. In contrast, models lacking this representation not only suffer from the reversal curse but also fail to generalize edits, further introducing logical inconsistencies. Our results establish that training on a relational knowledge dataset induces the emergence of bilinear internal representations, which in turn enable LMs to behave in a logically consistent manner after editing. This implies that the success of model editing depends critically not just on editing algorithms but on the underlying representational geometry of the knowledge being modified. Language models (LMs) have become powerful tools for knowledge-intensive tasks, yet their reasoning capabilities often fall short of human-level logical consistency (Berglund et al., 2024; Allen-Zhu & Li, 2025); a prominent example is the reversal curse: a model trained on "A is the parent of B" frequently fails to infer the reverse fact, "B is the child of A." This failure suggests that LMs learn shallow, directional associations rather than robust, symmetrical relationships, undermining their reliability. Ensuring logical consistency is particularly challenging in model editing, which seeks to update factual knowledge in a trained model without costly retraining from scratch.