AITopics

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry:

Health & Medicine (1.00)
Transportation > Ground > Road (0.94)
Transportation > Passenger (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Data Science (0.68)

Neural Information Processing SystemsFeb-11-2026, 21:09:13 GMT

Debiased Bayesian inference for average treatment effects

Kolyan Ray, Botond Szabo

Workinginthestandard potential outcomes framework, we propose a data-driven modification to an arbitrary (nonparametric) prior based on the propensity score that corrects for the first-orderposteriorbias,therebyimprovingperformance.Weillustrateourmethod for Gaussian process (GP) priors using (semi-)synthetic data.

artificial intelligence, machine learning, posterior, (18 more...)

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Neural Information Processing SystemsFeb-8-2026, 16:45:12 GMT

Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies

Figure 2: Latentz extracted by DSEs.

artificial intelligence, machine learning, urlhttp, (15 more...)

Country:

North America > United States > Tennessee (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)
Europe > Albania > Durrës County > Durrës (0.04)

Industry: Health & Medicine (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.68)

Brock, Tobias, Nagler, Thomas

Fast Rates for Nonstationary Weighted Risk Minimization

arXiv.org Machine LearningFeb-6-2026

Weighted empirical risk minimization is a common approach to prediction under distribution drift. This article studies its out-of-sample prediction error under nonstationarity. We provide a general decomposition of the excess risk into a learning term and an error term associated with distribution drift, and prove oracle inequalities for the learning error under mixing conditions. The learning bound holds uniformly over arbitrary weight classes and accounts for the effective sample size induced by the weight vector, the complexity of the weight and hypothesis classes, and potential data dependence. We illustrate the applicability and sharpness of our results in (auto-) regression problems with linear models, basis approximations, and neural networks, recovering minimax-optimal rates (up to logarithmic factors) when specialized to unweighted and stationary settings.

artificial intelligence, machine learning, theorem 3, (17 more...)

2602.05742

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Neural Information Processing SystemsDec-24-2025, 19:08:51 GMT

Partial Identification of Treatment Effects with Implicit Generative Models

We consider the problem of partial identification, the estimation of bounds on the treatment effects from observational data. Although studied using discrete treatment variables or in specific causal graphs (e.g., instrumental variables), partial identification has been recently explored using tools from deep generative modeling. We propose a new method for partial identification of average treatment effects (ATEs) in general causal graphs using implicit generative models comprising continuous and discrete random variables. Since ATE with continuous treatment is generally non-regular, we leverage the partial derivatives of response functions to define a regular approximation of ATE, a quantity we call uniform average treatment derivative (UATD). We prove that our algorithm converges to tight bounds on ATE in linear structural causal models (SCMs). For nonlinear SCMs, we empirically show that using UATD leads to tighter and more stable bounds than methods that directly optimize the ATE.

implicit generative model, partial identification, treatment effect, (6 more...)

Technology: Information Technology > Artificial Intelligence (0.83)

Jin, Jikai, Syrgkanis, Vasilis

Sharp Structure-Agnostic Lower Bounds for General Functional Estimation

arXiv.org Machine LearningDec-22-2025

The design of efficient nonparametric estimators has long been a central problem in statistics, machine learning, and decision making. Classical optimal procedures often rely on strong structural assumptions, which can be misspecified in practice and complicate deployment. This limitation has sparked growing interest in structure-agnostic approaches -- methods that debias black-box nuisance estimates without imposing structural priors. Understanding the fundamental limits of these methods is therefore crucial. This paper provides a systematic investigation of the optimal error rates achievable by structure-agnostic estimators. We first show that, for estimating the average treatment effect (ATE), a central parameter in causal inference, doubly robust learning attains optimal structure-agnostic error rates. We then extend our analysis to a general class of functionals that depend on unknown nuisance functions and establish the structure-agnostic optimality of debiased/double machine learning (DML). We distinguish two regimes -- one where double robustness is attainable and one where it is not -- leading to different optimal rates for first-order debiasing, and show that DML is optimal in both regimes. Finally, we instantiate our general lower bounds by deriving explicit optimal rates that recover existing results and extend to additional estimands of interest. Our results provide theoretical validation for widely used first-order debiasing methods and guidance for practitioners seeking optimal approaches in the absence of structural assumptions. This paper generalizes and subsumes the ATE lower bound established in \citet{jin2024structure} by the same authors.

assumption 6, perturbation, theorem 7, (15 more...)

2512.17341

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.67)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

arXiv.org Artificial IntelligenceDec-3-2025

LLM-as-a-Supervisor: Mistaken Therapeutic Behaviors Trigger Targeted Supervisory Feedback

Xu, Chen, Lv, Zhenyu, Lan, Tian, Wang, Xianyang, Ji, Luyao, Cui, Leyang, Yang, Minqiang, Shen, Jian, Dong, Qunxi, Liu, Xiuling, Wang, Juan, Hu, Bin

Although large language models (LLMs) hold significant promise in psychotherapy, their direct application in patient-facing scenarios raises ethical and safety concerns. Therefore, this work shifts towards developing an LLM as a supervisor to train real therapists. In addition to the privacy of clinical therapist training data, a fundamental contradiction complicates the training of therapeutic behaviors: clear feedback standards are necessary to ensure a controlled training system, yet there is no absolute "gold standard" for appropriate therapeutic behaviors in practice. In contrast, many common therapeutic mistakes are universal and identifiable, making them effective triggers for targeted feedback that can serve as clearer evidence. Motivated by this, we create a novel therapist-training paradigm: (1) guidelines for mistaken behaviors and targeted correction strategies are first established as standards; (2) a human-in-the-loop dialogue-feedback dataset is then constructed, where a mistake-prone agent intentionally makes standard mistakes during interviews naturally, and a supervisor agent locates and identifies mistakes and provides targeted feedback; (3) after fine-tuning on this dataset, the final supervisor model is provided for real therapist training. The detailed experimental results of automated, human and downstream assessments demonstrate that models fine-tuned on our dataset MATE, can provide high-quality feedback according to the clinical guideline, showing significant potential for the therapist training scenario.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.09042

Genre: Research Report > New Finding (0.68)

Industry:

Health & Medicine > Health Care Providers & Services (0.93)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Sert, Gözde, Chakrabortty, Abhishek, Bhattacharya, Anirban

Bayesian Semiparametric Causal Inference: Targeted Doubly Robust Estimation of Treatment Effects

arXiv.org Machine LearningNov-21-2025

We propose a semiparametric Bayesian methodology for estimating the average treatment effect (ATE) within the potential outcomes framework using observational data with high-dimensional nuisance parameters. Our method introduces a Bayesian debiasing procedure that corrects for bias arising from nuisance estimation and employs a targeted modeling strategy based on summary statistics rather than the full data. These summary statistics are identified in a debiased manner, enabling the estimation of nuisance bias via weighted observables and facilitating hierarchical learning of the ATE. By combining debiasing with sample splitting, our approach separates nuisance estimation from inference on the target parameter, reducing sensitivity to nuisance model specification. We establish that, under mild conditions, the marginal posterior for the ATE satisfies a Bernstein-von Mises theorem when both nuisance models are correctly specified and remains consistent and robust when only one is correct, achieving Bayesian double robustness. This ensures asymptotic efficiency and frequentist validity. Extensive simulations confirm the theoretical results, demonstrating accurate point estimation and credible intervals with nominal coverage, even in high-dimensional settings. The proposed framework can also be extended to other causal estimands, and its key principles offer a general foundation for advancing Bayesian semiparametric inference more broadly.

artificial intelligence, machine learning, posterior, (19 more...)

2511.15904

Country: North America > United States (0.92)

Genre: Research Report > Experimental Study (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Neural Information Processing SystemsNov-20-2025, 06:38:10 GMT

Towards Understanding Evolving Patterns in Sequential Data

In many machine learning tasks, data is inherently sequential.

information, machine learning, natural language, (19 more...)

Country:

North America > Canada (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Europe > Italy (0.04)

Genre: Research Report > Experimental Study (0.93)

Industry:

Education (0.67)
Information Technology (0.46)
Banking & Finance (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Wiederkehr, Christoph, Heumann, Christian, Schomaker, Michael

Causal Effect Estimation with TMLE: Handling Missing Data and Near-Violations of Positivity

arXiv.org Machine LearningOct-28-2025

We evaluate the performance of targeted maximum likelihood estimation (TMLE) for estimating the average treatment effect in missing data scenarios under varying levels of positivity violations. We employ model- and design-based simulations, with the latter using undersmoothed highly adaptive lasso on the 'WASH Benefits Bangladesh' dataset to mimic real-world complexities. Five missingness-directed acyclic graphs are considered, capturing common missing data mechanisms in epidemiological research, particularly in one-point exposure studies. These mechanisms include also not-at-random missingness in the exposure, outcome, and confounders. We compare eight missing data methods in conjunction with TMLE as the analysis method, distinguishing between non-multiple imputation (non-MI) and multiple imputation (MI) approaches. The MI approaches use both parametric and machine-learning models. Results show that non-MI methods, particularly complete cases with TMLE incorporating an outcome-missingness model, exhibit lower bias compared to all other evaluated missing data methods and greater robustness against positivity violations across. In Comparison MI with classification and regression trees (CART) achieve lower root mean squared error, while often maintaining nominal coverage rates. Our findings highlight the trade-offs between bias and coverage, and we recommend using complete cases with TMLE incorporating an outcome-missingness model for bias reduction and MI CART when accurate confidence intervals are the priority.

artificial intelligence, bayesian inference, machine learning, (18 more...)