AITopics | asm

Transferring Causal Effects using Proxies

Neural Information Processing SystemsJun-20-2026, 23:11:16 GMT

We consider the problem of estimating a causal effect in a multi-domain setting. The causal effect of interest is confounded by an unobserved confounder and can change between the different domains. We assume that we have access to a proxy of the hidden confounder and that all variables are discrete or categorical. We propose methodology to estimate the causal effect in the target domain, where we assume to observe only the proxy variable. Under these conditions, we prove identifiability (even when treatment and response variables are continuous). We introduce two estimation techniques, prove consistency, and derive confidence intervals. The theoretical results are supported by simulation studies and a real-world example studying the causal effect of website rankings on consumer choices.

artificial intelligence, machine learning, supp, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (0.46)

Add feedback

Stochastic Gradients under Nuisances

Neural Information Processing SystemsJun-18-2026, 21:50:08 GMT

Stochastic gradient optimization is the dominant learning paradigm for a variety of scenarios, from classical supervised learning to modern self-supervised learning. We consider stochastic gradient algorithms for learning problems whose objectives rely on unknown nuisance parameters, and establish non-asymptotic convergence guarantees. Our results show that, while the presence of a nuisance can alter the optimum and upset the optimization trajectory, the classical stochastic gradient algorithm may still converge under appropriate conditions, such as Neyman orthogonality. Moreover, even when Neyman orthogonality is not satisfied, we show that an algorithm variant with approximately orthogonalized updates (with an approximately orthogonalized gradient oracle) may achieve similar convergence rates. Examples from orthogonal statistical learning/double machine learning and causal inference are discussed.

artificial intelligence, gprop, machine learning, (16 more...)

Neural Information Processing Systems

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.86)

Industry:

Education (0.87)
Health & Medicine (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

Add feedback

Overfitting and Generalizing with (PAC) Bayesian Prediction in Noisy Binary Classification

Zhu, Xiaohan, Ohannessian, Mesrob I., Srebro, Nathan

arXiv.org Machine LearningMar-25-2026

We consider a PAC-Bayes type learning rule for binary classification, balancing the training error of a randomized ''posterior'' predictor with its KL divergence to a pre-specified ''prior''. This can be seen as an extension of a modified two-part-code Minimum Description Length (MDL) learning rule, to continuous priors and randomized predictions. With a balancing parameter of $λ=1$ this learning rule recovers an (empirical) Bayes posterior and a modified variant recovers the profile posterior, linking with standard Bayesian prediction (up to the treatment of the single-parameter noise level). However, from a risk-minimization prediction perspective, this Bayesian predictor overfits and can lead to non-vanishing excess loss in the agnostic case. Instead a choice of $λ\gg 1$, which can be seen as using a sample-size-dependent-prior, ensures uniformly vanishing excess loss even in the agnostic case. We precisely characterize the effect of under-regularizing (and over-regularizing) as a function of the balance parameter $λ$, understanding the regimes in which this under-regularization is tempered or catastrophic. This work extends previous work by Zhu and Srebro [2025] that considered only discrete priors to PAC Bayes type learning rules and, through their rigorous Bayesian interpretation, to Bayesian prediction more generally.

artificial intelligence, log 1, machine learning, (19 more...)

arXiv.org Machine Learning

2603.22644

Country: North America > United States (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Multi-Domain Empirical Bayes for Linearly-Mixed Causal Representations

Wu, Bohan, von Kügelgen, Julius, Blei, David M.

arXiv.org Machine LearningMar-24-2026

Causal representation learning (CRL) aims to learn low-dimensional causal latent variables from high-dimensional observations. While identifiability has been extensively studied for CRL, estimation has been less explored. In this paper, we explore the use of empirical Bayes (EB) to estimate causal representations. In particular, we consider the problem of learning from data from multiple domains, where differences between domains are modeled by interventions in a shared underlying causal model. Multi-domain CRL naturally poses a simultaneous inference problem that EB is designed to tackle. Here, we propose an EB $f$-modeling algorithm that improves the quality of learned causal variables by exploiting invariant structure within and across domains. Specifically, we consider a linear measurement model and interventional priors arising from a shared acyclic SCM. When the graph and intervention targets are known, we develop an EM-style algorithm based on causally structured score matching. We further discuss EB $g$-modeling in the context of existing CRL approaches. In experiments on synthetic data, our proposed method achieves more accurate estimation than other methods for CRL.

artificial intelligence, citedonp, machine learning, (18 more...)

arXiv.org Machine Learning

2603.18404

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > Switzerland (0.04)
North America > United States > New York (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Add feedback

Appendices Table of Contents

Neural Information Processing SystemsFeb-16-2026, 01:09:54 GMT

Other concurrent works address, e.g., learning from soft interventions with polynomial mixing [

artificial intelligence, intervention, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Collection (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

ed265bc903a5a097f61d3ec064d96d2e-Supplemental.pdf

Neural Information Processing SystemsFeb-11-2026, 00:08:36 GMT

asm, classification task, segmentation task, (14 more...)

Neural Information Processing Systems

Country:

North America > Canada (0.04)
Asia > Singapore (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

AdversarialStyleMiningforOne-Shot Unsupervised DomainAdaptation

Neural Information Processing SystemsFeb-11-2026, 00:08:28 GMT

Theintroduction ofDomainAdaptation (DA)techniquesaims to mitigate such performance drop when a trained agent encounters a different environment. By bridging the distribution gap between source and target domains, DA methods have shown their effect in many cross-domain tasks such as classification [27, 18], segmentation [19, 22, 23] and detection[3].

artificial intelligence, incvpr, machine learning, (16 more...)

Neural Information Processing Systems

Country: