conditional expectation
Optimal Adjustment Sets for Nonparametric Estimation of Weighted Controlled Direct Effect
The weighted controlled direct effect (WCDE) generalizes the standard controlled direct effect (CDE) by averaging over the mediator distribution, providing a robust estimate when treatment effects vary across mediator levels. This makes the WCDE especially relevant in fairness analysis, where it isolates the direct effect of an exposure on an outcome, independent of mediating pathways. This work establishes three fundamental advances for WCDE in observational studies: First, we establish necessary and sufficient conditions for the identifiability of the WCDE, clarifying when it diverges from the CDE. Next, we consider nonparametric estimation of the WCDE and derive its influence function, focusing on the class of regular and asymptotically linear estimators. Lastly, we characterize the optimal covariate adjustment set that minimizes the asymptotic variance, demonstrating how mediator-confounder interactions introduce distinct requirements compared to average treatment effect (ATE) estimation. Using synthetic and real-world data, we validate our theory numerically, showing that the proposed optimal valid adjustment set yields the lowest variance at practical sample sizes. Our results offer a principled framework for efficient estimation of direct effects in complex causal systems, with practical applications in fairness and mediation analysis.
Kernel conditional tests from learning-theoretic bounds
We propose a framework for hypothesis testing on conditional probability distributions, which we then use to construct statistical tests of functionals of conditional distributions. These tests identify the inputs where the functionals differ with high probability, and include tests of conditional moments or two-sample tests. Our key idea is to transform confidence bounds of a learning method into a test of conditional expectations.
Latent Process Generator Matching
Billera, Lukas, Nordlinder, Hedwig Nora, Murrell, Ben
A related situation arises when an auxiliary process is introduced to aid training but modelling its dynamics at generation time is unnecessary or difficult, as in Billera et al. [2025b] and Kim et al. [2025]. In each of these works, the projection result and its associated loss are derived on a case-by-case basis, and all theorems are restricted to marginalization over a discrete component of the extended state space. We introduce a general framework that removes these restrictions: given a time-inhomogeneous Feller process (Yt)0 t 1 on an arbitrary state space Y and a map Φ: Y X, one may learn a linear parametrisation of the generator of a Feller process on X whose one-time marginals coincide with those of (Φ(Yt))0 t 1. For Y = X Z and Φthe projection onto the first coordinate, this subsumes these prior works as special cases, allowing for a general class of latent processes (Zt)0 t 1 in a nearly arbitrary state space Z, using the formalism of generator matching to allow for continuous, discrete, or manifold-valued processes. In particular, the learnt process at t = 1 samples from the distribution of Φ(Y1), which is the desired data distribution. We give sufficient conditions for a loss function to be valid in this general setting, recovering the results of the works cited above as corollaries. This result has broad applicability, enabling the construction of a wide array of new flow matching schemes by allowing for a more general class of latent spaces. As a concrete new application, we outline a non-projection Φ: Y X with manifold-valued latents for protein structure generation that separates chain-level rigid-body motion from internal flexibility ( 4), where the particular chain-level versus residue-level or internal state is latent, and the model only sees the world state, which we plan to implement in future work. 2 EARLIERWORK Several recent generative models train with the aid of a latent stochastic process that is marginalised out at generation time.
Ambient Diffusion: Learning Clean Distributions from Corrupted Data
We present the first diffusion-based framework that can learn an unknown distribution using only highly-corrupted samples. This problem arises in scientific applications where access to uncorrupted samples is impossible or expensive to acquire. Another benefit of our approach is the ability to train generative models that are less likely to memorize any individual training sample, since they never observe clean training data. Our main idea is to introduce additional measurement distortion during the diffusion process and require the model to predict the original corrupted image from the further corrupted image. We prove that our method leads to models that learn the conditional expectation of the full uncorrupted image given this additional measurement corruption. This holds for any corruption process that satisfies some technical conditions (and in particular includes inpainting and compressed sensing). We train models on standard benchmarks (CelebA, CIFAR-10 and AFHQ) and show that we can learn the distribution even when all the training samples have 90%of their pixels missing. We also show that we can finetune foundation models on small corrupted datasets (e.g. MRI scans with block corruptions) and learn the clean distribution without memorizing the training set.
On Learning Fairness and Accuracy on Multiple Subgroups
We propose an analysis in fair learning that preserves the utility of the data while reducing prediction disparities under the criteria of group sufficiency. We focus on the scenario where the data contains multiple or even many subgroups, each with limited number of samples. As a result, we present a principled method for learning a fair predictor for all subgroups via formulating it as a bilevel objective. In the lower-level, the subgroup-specific predictors are learned through a small amount of data and the fair predictor. In the upper-level, the fair predictor is updated to be close to all subgroup specific predictors. We further prove that such a bilevel objective can effectively control the group sufficiency and generalization error. We evaluate the proposed framework on real-world datasets. Empirical evidence suggests the consistently improved fair predictions, as well as the comparable accuracy to the baselines.
Error Propagation and Model Collapse in Diffusion Models: A Theoretical Study
Khelifa, Nail B., Turner, Richard E., Venkataramanan, Ramji
Machine learning models are increasingly trained or fine-tuned on synthetic data. Recursively training on such data has been observed to significantly degrade performance in a wide range of tasks, often characterized by a progressive drift away from the target distribution. In this work, we theoretically analyze this phenomenon in the setting of score-based diffusion models. For a realistic pipeline where each training round uses a combination of synthetic data and fresh samples from the target distribution, we obtain upper and lower bounds on the accumulated divergence between the generated and target distributions. This allows us to characterize different regimes of drift, depending on the score estimation error and the proportion of fresh data used in each generation. We also provide empirical results on synthetic data and images to illustrate the theory.