Goto

Collaborating Authors

 sufficiency



Triangulation as an Acceptance Rule for Multilingual Mechanistic Interpretability

Long, Yanan

arXiv.org Machine Learning

Multilingual language models achieve strong aggregate performance yet often behave unpredictably across languages, scripts, and cultures. We argue that mechanistic explanations for such models should satisfy a \emph{causal} standard: claims must survive causal interventions and must \emph{cross-reference} across environments that perturb surface form while preserving meaning. We formalize \emph{reference families} as predicate-preserving variants and introduce \emph{triangulation}, an acceptance rule requiring necessity (ablating the circuit degrades the target behavior), sufficiency (patching activations transfers the behavior), and invariance (both effects remain directionally stable and of sufficient magnitude across the reference family). To supply candidate subgraphs, we adopt automatic circuit discovery and \emph{accept or reject} those candidates by triangulation. We ground triangulation in causal abstraction by casting it as an approximate transformation score over a distribution of interchange interventions, connect it to the pragmatic interpretability agenda, and present a comparative experimental protocol across multiple model families, language pairs, and tasks. Triangulation provides a falsifiable standard for mechanistic claims that filters spurious circuits passing single-environment tests but failing cross-lingual invariance.


Likelihood-Preserving Embeddings for Statistical Inference

Akdemir, Deniz

arXiv.org Machine Learning

Modern machine learning embeddings provide powerful compression of high-dimensional data, yet they typically destroy the geometric structure required for classical likelihood-based statistical inference. This paper develops a rigorous theory of likelihood-preserving embeddings: learned representations that can replace raw data in likelihood-based workflows -- hypothesis testing, confidence interval construction, model selection -- without altering inferential conclusions. We introduce the Likelihood-Ratio Distortion metric $Δ_n$, which measures the maximum error in log-likelihood ratios induced by an embedding. Our main theoretical contribution is the Hinge Theorem, which establishes that controlling $Δ_n$ is necessary and sufficient for preserving inference. Specifically, if the distortion satisfies $Δ_n = o_p(1)$, then (i) all likelihood-ratio based tests and Bayes factors are asymptotically preserved, and (ii) surrogate maximum likelihood estimators are asymptotically equivalent to full-data MLEs. We prove an impossibility result showing that universal likelihood preservation requires essentially invertible embeddings, motivating the need for model-class-specific guarantees. We then provide a constructive framework using neural networks as approximate sufficient statistics, deriving explicit bounds connecting training loss to inferential guarantees. Experiments on Gaussian and Cauchy distributions validate the sharp phase transition predicted by exponential family theory, and applications to distributed clinical inference demonstrate practical utility.


Identification and Estimation of Joint Probabilities of Potential Outcomes in Observational Studies with Covariate Information

Neural Information Processing Systems

The joint probabilities of potential outcomes are fundamental components of causal inference in the sense that (i) if they are identifiable, then the causal risk is also identifiable, but not vise versa (Pearl, 2009; Tian and Pearl, 2000) and (ii) they enable us to evaluate the probabilistic aspects of sufficiency'', and ``necessity and sufficiency'', which are important concepts of successful explanation (Watson, et al., 2020). However, because they are not identifiable without any assumptions, various assumptions have been utilized to evaluate the joint probabilities of potential outcomes, e.g., the assumption of monotonicity (Pearl, 2009; Tian and Pearl, 2000), the independence between potential outcomes (Robins and Richardson, 2011), the condition of gain equality (Li and Pearl, 2019), and the specific functional relationships between cause and effect (Pearl, 2009). Unlike existing identification conditions, in order to evaluate the joint probabilities of potential outcomes without such assumptions, this paper proposes two types of novel identification conditions using covariate information. In addition, when the joint probabilities of potential outcomes are identifiable through the proposed conditions, the estimation problem of the joint probabilities of potential outcomes reduces to that of singular models and thus they can not be evaluated by standard statistical estimation methods. To solve the problem, this paper proposes a new statistical estimation method based on the augmented Lagrangian method and shows the asymptotic normality of the proposed estimators. Given space constraints, the proofs, the details on the statistical estimation method, some numerical experiments, and the case study are provided in the supplementary material.


A Framework for Causal Concept-based Model Explanations

Bjøru, Anna Rodum, Lysnæs-Larsen, Jacob, Jørgensen, Oskar, Strümke, Inga, Langseth, Helge

arXiv.org Artificial Intelligence

This work presents a conceptual framework for causal concept-based post-hoc Explainable Artificial Intelligence (XAI), based on the requirements that explanations for non-interpretable models should be understandable as well as faithful to the model being explained. Local and global explanations are generated by calculating the probability of sufficiency of concept interventions. Example explanations are presented, generated with a proof-of-concept model made to explain classifiers trained on the CelebA dataset. Understandability is demonstrated through a clear concept-based vocabulary, subject to an implicit causal interpretation. Fidelity is addressed by highlighting important framework assumptions, stressing that the context of explanation interpretation must align with the context of explanation generation.



Supplementary Material A Algorithmic Details

Neural Information Processing Systems

A.1 Data Selection Via Time-Consistency We use time-consistency (TCS) [63] to select informative sample to apply our augmentation, which TCS can effectively improve the performance. T is set to be 5 for all the experiments.Algorithm 2 Fast Lagarangian Attack MethodInput: Training data ( x, y); The class preserving margin σ; Neural Network F () Output: Y) = I (X Y). 2) Proof of null -Minimality Since X is a deterministic function of Y and N, we have I ( X Y |N) = I (X Y) (33) Note that (33) holds for all sufficient statistics of X w.r.t. The proof of null -minimality is identical to the one under Problem (8). The two conditions in Theorem 4.2, Condition (a) or Condition (b), requires that the augmentation We show by Lemma B.2 that this InfoMin principle In contrast, our Theorem 4.2 characterizes two key conditions of P (X = x, Y = y) log P (X) = x, Y = y) P (X = x) P ( Y = y) (44) = I (X Y) (45) where the third equation utilizes the property of symmetric augmentation.Lemma B.2 If Assumption 4.1 holds, i.e., Lemma B.2 can be obtained by a simple adaptation from Proposition 3.1 by Achille and Soatto [ All the models are trained for 300 epochs. All the noise are symmetric noise.



A Appendix A.1 Sufficiency of J

Neural Information Processing Systems

We describe the proofs for the sufficiency results from Section 5 here. Note that the first term in Equation 7 is zero by the conditional independence assumptions in Figure 8. Equating the expansions, we can see that to satisfy our assumption that The first term in Equation 9 is zero by the conditional independence assumptions in Figure 8. First note that the statement is not trivially true. Comparing this with the statement we'd like to prove, we can see that the key idea is to show that the MI equivalence implies that p ( Y | Z, X = x)= p ( Y | Z) . The didactic examples are computed as follows.