Goto

Collaborating Authors

 Marconato, Emanuele


All or None: Identifiable Linear Properties of Next-token Predictors in Language Modeling

arXiv.org Machine Learning

In natural language processing, it is well-established that linear relationships between highdimensional, real-valued vector representations of textual inputs reflect semantic and syntactic patterns. This was motivated in seminal works [4, 5, 6, 7, 8] and extensively validated in word embedding models [9, 10, 11] as well as modern large language models trained for next-token prediction [2, 12, 13, 14, 15, 16, 17, 18, 19]. This ubiquity is puzzling, as different internal representations can produce identical next-token distributions, resulting in distribution-equivalent but internally distinct models. This raises a key question: Are the observed linear properties shared across all models with the same next-token distribution? Our main result is a mathematical proof that, under suitable conditions, certain linear properties hold for either all or none of the equivalent models generating a given next-token distribution. We demonstrate this through three main contributions. The first main contribution (Section 3) is an identifiability result characterizing distributionequivalent next-token predictors. Our result is a generalization of the main theorems by Roeder et al. [3] and Khemakhem et al. [20], relaxing the assumptions of diversity and equal representation dimensionality. This result is of independent interest for research on identifiable representation learning since our analysis is applicable to several discriminative models beyond next-token prediction [3].


Neuro-Symbolic Continual Learning: Knowledge, Reasoning Shortcuts and Concept Rehearsal

arXiv.org Artificial Intelligence

We initiate the study of Neuro-Symbolic Continual Learning (NeSy-CL), in which the goal is to solve a sequence We introduce Neuro-Symbolic Continual Learning, of neuro-symbolic tasks. As is common in neuro-symbolic where a model has to solve a sequence of (NeSy) prediction (Manhaeve et al., 2018; Xu et al., 2018; neuro-symbolic tasks, that is, it has to map subsymbolic Giunchiglia & Lukasiewicz, 2020; Hoernle et al., 2022; inputs to high-level concepts and compute Ahmed et al., 2022a), the machine is provided prior knowledge predictions by reasoning consistently with relating one or more target labels to symbolic, highlevel prior knowledge. Our key observation is that concepts extracted from sub-symbolic data, and has to neuro-symbolic tasks, although different, often compute a prediction by reasoning over said concepts. The share concepts whose semantics remains stable central challenge of Nesy-CL is that the data distribution over time. Traditional approaches fall short: existing and the knowledge may vary across tasks. E.g., in medical continual strategies ignore knowledge altogether, diagnosis knowledge may encode known relationships between while stock neuro-symbolic architectures possible symptoms and conditions, while different suffer from catastrophic forgetting. We show that tasks are characterized by different distributions of X-ray leveraging prior knowledge by combining neurosymbolic scans, symptoms and conditions. The goal, as in continual architectures with continual strategies learning (CL) (Parisi et al., 2019), is to obtain a model that does help avoid catastrophic forgetting, but also attains high accuracy on new tasks without forgetting what that doing so can yield models affected by reasoning it has already learned under a limited storage budget.


Not All Neuro-Symbolic Concepts Are Created Equal: Analysis and Mitigation of Reasoning Shortcuts

arXiv.org Machine Learning

Neuro-Symbolic (NeSy) predictive models hold the promise of improved compliance with given constraints, systematic generalization, and interpretability, as they allow to infer labels that are consistent with some prior knowledge by reasoning over high-level concepts extracted from sub-symbolic inputs. It was recently shown that NeSy predictors are affected by reasoning shortcuts: they can attain high accuracy but by leveraging concepts with unintended semantics, thus coming short of their promised advantages. Yet, a systematic characterization of reasoning shortcuts and of potential mitigation strategies is missing. This work fills this gap by characterizing them as unintended optima of the learning objective and identifying four key conditions behind their occurrence. Based on this, we derive several natural mitigation strategies, and analyze their efficacy both theoretically and empirically. Our analysis shows reasoning shortcuts are difficult to deal with, casting doubts on the trustworthiness and interpretability of existing NeSy solutions.


Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning

arXiv.org Artificial Intelligence

Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks. Our formalization of HRL builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us to derive a principled notion of alignment between the machine representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglment. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.


Neuro-Symbolic Reasoning Shortcuts: Mitigation Strategies and their Limitations

arXiv.org Artificial Intelligence

Neuro-symbolic predictors learn a mapping from sub-symbolic inputs to higher-level concepts and then carry out (probabilistic) logical inference on this intermediate representation. This setup offers clear advantages in terms of consistency to symbolic prior knowledge, and is often believed to provide interpretability benefits in that - by virtue of complying with the knowledge - the learned concepts can be better understood by human stakeholders. However, it was recently shown that this setup is affected by reasoning shortcuts whereby predictions attain high accuracy by leveraging concepts with unintended semantics, yielding poor out-of-distribution performance and compromising interpretability. In this short paper, we establish a formal link between reasoning shortcuts and the optima of the loss function, and identify situations in which reasoning shortcuts can arise. Based on this, we discuss limitations of natural mitigation strategies such as reconstruction and concept supervision.