All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling

Marconato, Emanuele, Lachapelle, Sébastien, Weichwald, Sebastian, Gresele, Luigi

arXiv.org Machine Learning 

In natural language processing, it is well-established that linear relationships between highdimensional, real-valued vector representations of textual inputs reflect semantic and syntactic patterns. This was motivated in seminal works [4, 5, 6, 7, 8] and extensively validated in word embedding models [9, 10, 11] as well as modern large language models trained for next-token prediction [2, 12, 13, 14, 15, 16, 17, 18, 19]. This ubiquity is puzzling, as different internal representations can produce identical next-token distributions, resulting in distribution-equivalent but internally distinct models. This raises a key question: Are the observed linear properties shared across all models with the same next-token distribution? Our main result is a mathematical proof that, under suitable conditions, certain linear properties hold for either all or none of the equivalent models generating a given next-token distribution. We demonstrate this through three main contributions. The first main contribution (Section 3) is an identifiability result characterizing distributionequivalent next-token predictors. Our result is a generalization of the main theorems by Roeder et al. [3] and Khemakhem et al. [20], relaxing the assumptions of diversity and equal representation dimensionality. This result is of independent interest for research on identifiable representation learning since our analysis is applicable to several discriminative models beyond next-token prediction [3].