thm
Counterfactual Image Editing with Disentangled Causal Latent Space
The process of editing an image can be naturally modeled as evaluating a counterfactual query: "What would an image look like if a particular feature had changed?" While recent advances in text-guided image editing leverage powerful pre-trained models to produce visually appealing images, they often lack counterfactual consistency - ignoring how features are causally related and how changing one may affect others. In contrast, existing causal-based editing approaches offer solid theoretical foundations and perform well in specific settings, but remain limited in scalability and often rely on labeled data. In this work, we aim to bridge the gap between causal editing and large-scale text-to-image generation through two main contributions. First, we introduce Backdoor Disentangled Causal Latent Space (BD-CLS), a new class of latent spaces that allows for the encoding of causal inductive biases. One desirable property of this latent space is that, even under weak supervision, it can be shown to exhibit counterfactual consistency. Second, and building on this result, we develop BD-CLS-Edit, an algorithm capable of learning a BD-CLS from a (non-causal) pre-trained Stable Diffusion model. This enables counterfactual image editing without retraining. Our method ensures that edits respect the causal relationships among features, even when some features are unlabeled or unprompted and the original latent space is oblivious to the environment's underlying cause-and-effect relationships.
From Black-box to Causal-box: Towards Building More Interpretable Models
Understanding the predictions made by deep learning models remains a central challenge, especially in high-stakes applications. A promising approach is to equip models with the ability to answer counterfactual questions - hypothetical "what if?" scenarios that go beyond the observed data and provide insight into a model reasoning. In this work, we introduce the notion of causal interpretability, which formalizes when counterfactual queries can be evaluated from a specific class of models and observational data. We analyze two common model classes - blackbox and concept-based predictors - and show that neither is causally interpretable in general. To address this gap, we develop a framework for building models that are causally interpretable by design. Specifically, we derive a complete graphical criterion that determines whether a given model architecture supports a given counterfactual query. This leads to a fundamental tradeoff between causal interpretability and predictive accuracy, which we characterize by identifying the unique maximal set of features that yields an interpretable model with maximal predictive expressiveness. Experiments corroborate the theoretical findings.
Confounder Detection via Treatment Intent: A New Observational Study Design
Plecko, Drago, Okanovic, Patrik, Hoefler, Torsten, Bareinboim, Elias
Understanding the effects of interventions is central to scientific progress, with randomized controlled trials (RCTs) regarded as the gold standard for causal inference in many applied fields. However, RCTs are costly, time-consuming, and often constrained by ethical or practical limitations, motivating the need for causal methods able to draw conclusions from observational data. While such data is collected at ever larger scale, making its use for causal inference is often hindered by the fact that not all variables affecting treatment allocation and the outcome are observed - an issue known as unobserved confounding. In this paper, we introduce a new study design called confounder detection via treatment intent. The idea is to query a human expert who makes treatment decisions, and ask them to compare pairs of units proposed by a principled matching strategy, with the goal of eliciting unobserved variables that explain why treatment decisions differ. We provide a theoretical basis for such a procedure, ascertaining conditions under which such a study design may elicit unobserved confounders. Building on this newly established foundations, we study treatment effects of interventions in the intensive care unit (ICU). First, we show empirical evidence strongly indicating that electronic health records (EHRs) collected in ICUs are subject to unobserved confounding. By using clinical text notes as a proxy for physicians' knowledge and leveraging natural language processing, we provide a proof of concept for our methodology in a semi-synthetic environment with a known ground truth.
Integral Probability Metrics PAC-Bayes Bounds
We present a PAC-Bayes-style generalization bound which enables the replacement of the KL-divergence with a variety of Integral Probability Metrics (IPM). We provide instances of this bound with the IPM being the total variation metric and the Wasserstein distance. A notable feature of the obtained bounds is that they naturally interpolate between classical uniform convergence bounds in the worst case (when the prior and posterior are far away from each other), and improved bounds in favorable cases (when the posterior and prior are close). This illustrates the possibility of reinforcing classical generalization bounds with algorithm-and data-dependent components, thus making them more suitable to analyze algorithms that use a large hypothesis space.
Error Bounds for Learning with Vector-Valued Random Features
This paper provides a comprehensive error analysis of learning with vector-valued random features (RF). The theory is developed for RF ridge regression in a fully general infinite-dimensional input-output setting, but nonetheless applies to and improves existing finite-dimensional analyses. In contrast to comparable work in the literature, the approach proposed here relies on a direct analysis of the underlying risk functional and completely avoids the explicit RF ridge regression solution formula in terms of random matrices. This removes the need for concentration results in random matrix theory or their generalizations to random operators. The main results established in this paper include strong consistency of vector-valued RF estimators under model misspecification and minimax optimal convergence rates in the well-specified setting. The parameter complexity (number of random features) and sample complexity (number of labeled data) required to achieve such rates are comparable with Monte Carlo intuition and free from logarithmic factors.
Appendix
We first introduce some handy concepts and results to make the proof succinct, meanwhile providing more information for understanding our model and theory. We begin with some extended discussions on CSG. Note that a reparameterization unnecessarily has its output dimensions in S, i.e. The condition that p(y|s) = p0(y|ΦS(s,v)) for any v V does not indicate that ΦS(s,v) is constant of v, since p0(y|s0) may ignore the change of s0 = ΦS(s,v) from the change of v. The following lemma shows the meaning of a reparameterization: it allows a CSG to vary while inducing the same distribution on the observed data variables (x,y) (i.e., holding the same effect on describing data). We can now define and verify an equivalent relation on CSGs so that the resulting equivalent class contains CSGs that induce the same (x,y) data distribution and hold the same semantic information in their svariables. We say two CSGs pand p0 are semantic-equivalent, if there exists a homeomorphism11 Φ on S V, such that (i) is semantic-preserving: its output dimensions in S is constant of v, ΦS(s,v) = ΦS(s) for any v V, and (ii) it acts as a reparameterization from p to p0: Φ#[ps,v] = p0s,v, p(x|s,v) = p0(x|Φ(s,v)) and p(y|s) = p0(y|ΦS(s)). A.1 below shows that the defined binary relation is indeed an equivalence relation in common cases. As a reparameterization, Φ allows the two models to have different latent-variable parameterizations while inducing the same distribution on the observed data variables (x,y) (Lemma 9). This definition of semantic-equivalence can be rephrased as the existence of a semantic-preserving reparameterization. With proper model assumptions, we can show that any reparameterization between two CSGs is semantic-preserving, so that semantic-preserving CSGs cannot be converted to each other by a reparameterization that mixes swith v. Lemma 11. For two CSGs pand p0, if p0(y|s) has a statistics M0(s) that is an injective function of s, then any reparameterization Φ from pto p0, if exists, has its ΦS constant of v. Proof. Then the condition that p(y|s) = p0(y|ΦS(s,v)) for any v V indicates that M(s) = M0(ΦS(s,v)). If there exist s S and v(1) 6= v(2) V such that ΦS(s,v(1)) 6= ΦS(s,v(2)), then M0(ΦS(s,v(1))) 6= M0(ΦS(s,v(2))) 11A transformation is a homeomorphism if it is a continuous bijection with continuous inverse. This violates M(s) = M0(ΦS(s,v)) which requires both M0(ΦS(s,v(1))) and M0(ΦS(s,v(2))) to be equal to M(s). We then introduce two mathematical facts. Let z be a random variable on a Euclidean space RdZ with density function pz(z), and let Φ be a homeomorphism on RdZ whose inverse Φ 1 is differentiable.
_NeurIPS2023_CR__Certified_Backdoor_Detection.pdf
The main purpose of this research is to provide the user of DNN classifiers with a method to detect if the model is backdoor attacked without access to the training set. All attacks used to evaluate our detection method in this paper are created by published backdoor attack strategies on public datasets. Thus, we did not create new threats to society. Moreover, our work provides a new perspective on backdoor defense, as it is the first to address the certification of backdoor detection. It helps other researchers to understand the behavior of deep learning systems facing malicious activities. While existing backdoor detectors are all empirical [67, 20, 75, 41, 69, 6, 56, 13], our work initiates a new research direction - backdoor detection with certification. Moreover, we first exposed that certified backdoor detectors and certified robustness against backdoor attacks complement each other [86, 71, 27, 53].
Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model
Tarbouriech, Jean, Pirotta, Matteo, Valko, Michal, Lazaric, Alessandro
We study the sample complexity of learning an $ε$-optimal policy in the Stochastic Shortest Path (SSP) problem. We first derive sample complexity bounds when the learner has access to a generative model. We show that there exists a worst-case SSP instance with $S$ states, $A$ actions, minimum cost $c_{\min}$, and maximum expected cost of the optimal policy over all states $B_{\star}$, where any algorithm requires at least $Ω(SAB_{\star}^3/(c_{\min}ε^2))$ samples to return an $ε$-optimal policy with high probability. Surprisingly, this implies that whenever $c_{\min} = 0$ an SSP problem may not be learnable, thus revealing that learning in SSPs is strictly harder than in the finite-horizon and discounted settings. We complement this lower bound with an algorithm that matches it, up to logarithmic factors, in the general case, and an algorithm that matches it up to logarithmic factors even when $c_{\min} = 0$, but only under the condition that the optimal policy has a bounded hitting time to the goal state.
Task Ecologies and the Evolution of World-Tracking Representations in Large Language Models
We study language models as evolving model organisms and ask when autoregressive next-token learning selects for world-tracking representations. For any encoding of latent world states, the Bayes-optimal next-token cross-entropy decomposes into the irreducible conditional entropy plus a Jensen--Shannon excess term. That excess vanishes if and only if the encoding preserves the training ecology's equivalence classes. This yields a precise notion of ecological veridicality for language models and identifies the minimum-complexity zero-excess solution as the quotient partition by training equivalence. We then determine when this fixed-encoding analysis applies to transformer families: frozen dense and frozen Mixture-of-Experts transformers satisfy it, in-context learning does not enlarge the model's separation set, and per-task adaptation breaks the premise. The framework predicts two characteristic failure modes: simplicity pressure preferentially removes low-gain distinctions, and training-optimal models can still incur positive excess on deployment ecologies that refine the training ecology. A conditional dynamic extension shows how inter-model selection and post-training can recover such gap distinctions under explicit heredity, variation, and selection assumptions. Exact finite-ecology checks and controlled microgpt experiments validate the static decomposition, split-merge threshold, off-ecology failure pattern, and two-ecology rescue mechanism in a regime where the relevant quantities are directly observable. The goal is not to model frontier systems at scale, but to use small language models as laboratory organisms for theory about representational selection.