section
- North America > United States > New York (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Massachusetts (0.04)
- North America > United States > California (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > United States > Nebraska (0.04)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Hong Kong (0.04)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.70)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Speech (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Rate doubly robust estimation for weighted average treatment effects
Wang, Yiming, Liu, Yi, Yang, Shu
The weighted average treatment effect (WATE) defines a versatile class of causal estimands for populations characterized by propensity score weights, including the average treatment effect (ATE), treatment effect on the treated (ATT), on controls (ATC), and for the overlap population (ATO). WATE has broad applicability in social and medical research, as many datasets from these fields align with its framework. However, the literature lacks a systematic investigation into the robustness and efficiency conditions for WATE estimation. Although doubly robust (DR) estimators are well-studied for ATE, their applicability to other WATEs remains uncertain. This paper investigates whether widely used WATEs admit DR or rate doubly robust (RDR) estimators and assesses the role of nuisance function accuracy, particularly with machine learning. Using semiparametric efficient influence function (EIF) theory and double/debiased machine learning (DML), we propose three RDR estimators under specific rate and regularity conditions and evaluate their performance via Monte Carlo simulations. Applications to NHANES data on smoking and blood lead levels, and SIPP data on 401(k) eligibility, demonstrate the methods' practical relevance in medical and social sciences.
- North America > United States > North Carolina > Wake County > Raleigh (0.04)
- North America > United States > New York (0.04)
- Research Report > Strength High (1.00)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
A Computable Measure of Suboptimality for Entropy-Regularised Variational Objectives
Chazal, Clémentine, Kanagawa, Heishiro, Shen, Zheyang, Korba, Anna, Oates, Chris. J.
Several emerging post-Bayesian methods target a probability distribution for which an entropy-regularised variational objective is minimised. This increased flexibility introduces a computational challenge, as one loses access to an explicit unnormalised density for the target. To mitigate this difficulty, we introduce a novel measure of suboptimality called 'gradient discrepancy', and in particular a 'kernel gradient discrepancy' (KGD) that can be explicitly computed. In the standard Bayesian context, KGD coincides with the kernel Stein discrepancy (KSD), and we obtain a novel charasterisation of KSD as measuring the size of a variational gradient. Outside this familiar setting, KGD enables novel sampling algorithms to be developed and compared, even when unnormalised densities cannot be obtained. To illustrate this point several novel algorithms are proposed, including a natural generalisation of Stein variational gradient descent, with applications to mean-field neural networks and prediction-centric uncertainty quantification presented. On the theoretical side, our principal contribution is to establish sufficient conditions for desirable properties of KGD, such as continuity and convergence control.
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
- (4 more...)
- Instructional Material (0.67)
- Research Report (0.50)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.87)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
The paper presents some extensions to the Pentina and Lampert's PAC-Bayesian analysis of "Lifelong Learning" problems (ICML 2014), where a learner must adapt to various tasks exploiting knowledge from previously seen ones. The main contributions are risk bounds dedicated to two scenarios where the observed task are not sampled independently from each other. Roughly speaking, the first scenario share similarities with domain adaptation (albeit the risk bound is given on an average of all possible domains, instead of on a specific target domain) and the second is quasi-identical (up to my knowledge) to distribution drift. In the first setting (Section 3), the authors cleverly reuse Ralaivola et al.'s chromatic PAC-Bayesian theory to represent dependencies between tasks. However, this result alone let me unsatisfied. I wonder to which extent this result can be useful to the ambitious "lifelong learning" problem the authors are interested in.
Reviews: Learning Bound for Parameter Transfer Learning
The parameter transfer learning framework described in the paper is very interesting and deserves attention. The approach taken by the authors (describe in Section 2) is sound but lacks clarity. The notation is well chosen, but not always properly explained (see my "Specific comments" below). Also, as the transfer learning framework is very similar to domain adaptation, which is studied in many papers (as Ben-David et al. 2007 cited by the authors), it would be interesting to discuss the connection of Theorem 1 with existing domain adaptation results. Section 3 is difficult to follow for a reader not familiar with sparse coding (like myself).
Reviews: Low-rank Interaction with Sparse Additive Effects Model for Large Data Frames
Summary ------- This paper introduces a new statistical model for matrices of heterogeneous data (data frames) based on the exponential family. The features of this model are: i) modeling additive effects in a sparse way, ii) modeling low-rank interactions. The parameters of this model are then estimated by maximizing the likelihood with sparse and low-rank regularizations. In addition, this work comes with statistical guarantees and optimization convergence guarantees of the proposed algorithm. Numerical experiments concludes the manuscript. Quality ------- This paper is mathematically rigorous and technically sound.