Shpitser, Ilya
Zero Inflation as a Missing Data Problem: a Proxy-based Approach
Phung, Trung, Lee, Jaron J. R., Oladapo-Shittu, Opeyemi, Klein, Eili Y., Gurses, Ayse Pinar, Hannum, Susan M., Weems, Kimberly, Marsteller, Jill A., Cosgrove, Sara E., Keller, Sara C., Shpitser, Ilya
A common type of zero-inflated data has certain true values incorrectly replaced by zeros due to data recording conventions (rare outcomes assumed to be absent) or details of data recording equipment (e.g. artificial zeros in gene expression data). Existing methods for zero-inflated data either fit the observed data likelihood via parametric mixture models that explicitly represent excess zeros, or aim to replace excess zeros by imputed values. If the goal of the analysis relies on knowing true data realizations, a particular challenge with zero-inflated data is identifiability, since it is difficult to correctly determine which observed zeros are real and which are inflated. This paper views zero-inflated data as a general type of missing data problem, where the observability indicator for a potentially censored variable is itself unobserved whenever a zero is recorded. We show that, without additional assumptions, target parameters involving a zero-inflated variable are not identified. However, if a proxy of the missingness indicator is observed, a modification of the effect restoration approach of Kuroki and Pearl allows identification and estimation, given the proxy-indicator relationship is known. If this relationship is unknown, our approach yields a partial identification strategy for sensitivity analysis. Specifically, we show that only certain proxy-indicator relationships are compatible with the observed data distribution. We give an analytic bound for this relationship in cases with a categorical outcome, which is sharp in certain models. For more complex cases, sharp numerical bounds may be computed using methods in Duarte et al.[2023]. We illustrate our method via simulation studies and a data application on central line-associated bloodstream infections (CLABSIs).
Evaluation of Active Feature Acquisition Methods for Time-varying Feature Settings
von Kleist, Henrik, Zamanian, Alireza, Shpitser, Ilya, Ahmidi, Narges
Machine learning methods often assume input features are available at no cost. However, in domains like healthcare, where acquiring features could be expensive or harmful, it is necessary to balance a feature's acquisition cost against its predictive value. The task of training an AI agent to decide which features to acquire is called active feature acquisition (AFA). By deploying an AFA agent, we effectively alter the acquisition strategy and trigger a distribution shift. To safely deploy AFA agents under this distribution shift, we present the problem of active feature acquisition performance evaluation (AFAPE). We examine AFAPE under i) a no direct effect (NDE) assumption, stating that acquisitions don't affect the underlying feature values; and ii) a no unobserved confounding (NUC) assumption, stating that retrospective feature acquisition decisions were only based on observed features. We show that one can apply offline reinforcement learning under the NUC assumption and missing data methods under the NDE assumption. When NUC and NDE hold, we propose a novel semi-offline reinforcement learning framework, which requires a weaker positivity assumption and yields more data-efficient estimators. We introduce three novel estimators: a direct method (DM), an inverse probability weighting (IPW), and a double reinforcement learning (DRL) estimator.
Evaluation of Active Feature Acquisition Methods for Static Feature Settings
von Kleist, Henrik, Zamanian, Alireza, Shpitser, Ilya, Ahmidi, Narges
Machine learning (ML) methods generally assume the ready availability of the complete set of input features at deployment, typically incurring little to no cost. However, this assumption does not hold universally, especially in scenarios where feature acquisitions are associated with substantial costs. In contexts like medical diagnostics, the cost of acquiring certain features, such as X-rays, biopsies, etc. encompasses not only financial costs but also poses potential risks to patient well-being. In such cases, the cost or harm of the feature acquisition should be balanced against the predictive value of the feature. Active feature acquisition (AFA) addresses this problem by training two AI components: i) the "AFA agent," an AI system tasked with determining which features should be observed, and ii) an ML prediction model that undertakes the prediction task based on the acquired feature set. While missingness was effectively determined by, for example, a physician during the acquisition of the retrospective dataset, the missingness at the deployment of the AFA agent is determined by the AFA agent, thereby leading to a missingness distribution shift. In our companion paper [1], we formulate the problem of active feature acquisition performance evaluation (AFAPE) which addresses the task of estimating the performance an AFA agent would have at deployment, from the retrospective dataset. Consequently, upon completing the AFAPE problem, the physician will be well-informed about expected rates of incorrect diagnoses and the average costs associated with feature acquisitions when the AFA system is put into operation.
Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach
Wang, Zixiao, Ghassami, AmirEmad, Shpitser, Ilya
Missing data is a pervasive and challenging issue in various applications of statistical inference, such as healthcare, economics, and the social sciences. Data are said to be Missing at Random (MAR) when the mechanism of missingness depends only on the observed data. Strategies to deal with MAR have been extensively investigated in the literature (Dempster et al., 1977; Robins et al., 1994; Tsiatis, 2006; Little and Rubin, 2019). In many practical settings, MAR is not a realistic assumption. Instead, missingness often depends on variables that are themselves missing. Such settings are said to exhibit nonignorable missingness, with the resulting data being Missing Not at Random (MNAR) (Fielding et al., 2008; Schafer and Graham, 2002), A classic example of a scenario with MNAR data occurs in longitudinal studies, due to the treatment's toxicity, some patients may become too ill to visit the clinic, leading to the situation where the outcome of certain patients with circumstances associated with those outcomes are more likely to be lost to follow-up (Ibrahim et al., 2012). Previous MNAR models have typically imposed constraints on the target distribution and its missingness mechanism, ensuring the parameter of interest can be identified. This approach goes back to the work of Heckman (1979), who proposed an outcome-selection model based on parametric modeling of the outcome variable and missing pattern. Little (1993) introduced the pattern-mixture model where one needs to specify the distribution for each missing data pattern independently.
An Introduction to Causal Inference Methods for Observational Human-Robot Interaction Research
Lee, Jaron J. R., Ajaykumar, Gopika, Shpitser, Ilya, Huang, Chien-Ming
Quantitative methods in Human-Robot Interaction (HRI) research have primarily relied upon randomized, controlled experiments in laboratory settings. However, such experiments are not always feasible when external validity, ethical constraints, and ease of data collection are of concern. Furthermore, as consumer robots become increasingly available, increasing amounts of real-world data will be available to HRI researchers, which prompts the need for quantative approaches tailored to the analysis of observational data. In this article, we present an alternate approach towards quantitative research for HRI researchers using methods from causal inference that can enable researchers to identify causal relationships in observational settings where randomized, controlled experiments cannot be run. We highlight different scenarios that HRI research with consumer household robots may involve to contextualize how methods from causal inference can be applied to observational HRI research. We then provide a tutorial summarizing key concepts from causal inference using a graphical model perspective and link to code examples throughout the article, which are available at https://gitlab.com/causal/causal_hri. Our work paves the way for further discussion on new approaches towards observational HRI research while providing a starting point for HRI researchers to add causal inference techniques to their analytical toolbox.
When does the ID algorithm fail?
Shpitser, Ilya
The ID algorithm solves the problem of identification of interventional distributions of the form p( Y | do( a)) in graphical causal models, and has been formulated in a number of ways [12, 9, 6]. The ID algorithm is sound (outputs the correct functional of the observed data distribution whenever p( Y | do( a)) is identified in the causal model represented by the input graph), and complete (explicitly flags as a failure any input p( Y | do( a)) whenever this distribution is not identified in the causal model represented by the input graph). The reference [9] provides a result, the so called "hedge criterion" (Corollary 3), which aims to give a graphical characterization of situations when the ID algorithm fails to identify its input in terms of a structure in the input graph called the hedge. While the ID algorithm is, indeed, a sound and complete algorithm, and the hedge structure does arise whenever the input distribution is not identified, Corollary 3 presented in [9] is incorrect as stated. In this note, I outline the modern presentation of the ID algorithm, discuss a simple counterexample to Corollary 3, and provide a number of graphical characterizations of the ID algorithm failing to identify its input distribution.
Partial Identification of Causal Effects Using Proxy Variables
Ghassami, AmirEmad, Shpitser, Ilya, Tchetgen, Eric Tchetgen
Proximal causal inference is a recently proposed framework for evaluating the causal effect of a treatment on an outcome variable in the presence of unmeasured confounding (Miao et al., 2018; Tchetgen Tchetgen et al., 2020). For nonparametric point identification of causal effects, the framework leverages a pair of so-called treatment and outcome confounding proxy variables, in order to identify a bridge function that matches the dependence of potential outcomes or treatment variables on the hidden factors to corresponding functions of observed proxies. Unique identification of a causal effect via a bridge function crucially requires that proxies are sufficiently relevant for hidden factors, a requirement that has previously been formalized as a completeness condition. However, completeness is well-known not to be empirically testable, and although a bridge function may be well-defined in a given setting, lack of completeness, sometimes manifested by availability of a single type of proxy, may severely limit prospects for identification of a bridge function and thus a causal effect; therefore, potentially restricting the application of the proximal causal framework. In this paper, we propose partial identification methods that do not require completeness and obviate the need for identification of a bridge function. That is, we establish that proxies of unobserved confounders can be leveraged to obtain bounds on the causal effect of the treatment on the outcome even if available information does not suffice to identify either a bridge function or a corresponding causal effect of interest. We further establish analogous partial identification results in related settings where identification hinges upon hidden mediators for which proxies are available, however such proxies are not sufficiently rich for point identification of a bridge function or a corresponding causal effect of interest.
Combining Experimental and Observational Data for Identification of Long-Term Causal Effects
Ghassami, AmirEmad, Shpitser, Ilya, Tchetgen, Eric Tchetgen
We consider the task of estimating the causal effect of a treatment variable on a long-term outcome variable using data from an observational domain and an experimental domain. The observational data is assumed to be confounded and hence without further assumptions, this dataset alone cannot be used for causal inference. Also, only a short-term version of the primary outcome variable of interest is observed in the experimental data, and hence, this dataset alone cannot be used for causal inference either. In a recent work, Athey et al. (2020) proposed a method for systematically combining such data for identifying the downstream causal effect in view. Their approach is based on the assumptions of internal and external validity of the experimental data, and an extra novel assumption called latent unconfoundedness. In this paper, we first review their proposed approach and discuss the latent unconfoundedness assumption. Then we propose two alternative approaches for data fusion for the purpose of estimating average treatment effect as well as the effect of treatment on the treated. Our first proposed approach is based on assuming equi-confounding bias for the short-term and long-term outcomes. Our second proposed approach is based on the proximal causal inference framework, in which we assume the existence of an extra variable in the system which is a proxy of the latent confounder of the treatment-outcome relation.
Robustness in Deep Learning for Computer Vision: Mind the gap?
Drenkow, Nathan, Sani, Numair, Shpitser, Ilya, Unberath, Mathias
Deep neural networks for computer vision tasks are deployed in increasingly safety-critical and socially-impactful applications, motivating the need to close the gap in model performance under varied, naturally occurring imaging conditions. Robustness, ambiguously used in multiple contexts including adversarial machine learning, here then refers to preserving model performance under naturally-induced image corruptions or alterations. We perform a systematic review to identify, analyze, and summarize current definitions and progress towards non-adversarial robustness in deep learning for computer vision. We find that this area of research has received disproportionately little attention relative to adversarial machine learning, yet a significant robustness gap exists that often manifests in performance degradation similar in magnitude to adversarial conditions. To provide a more transparent definition of robustness across contexts, we introduce a structural causal model of the data generating process and interpret non-adversarial robustness as pertaining to a model's behavior on corrupted images which correspond to low-probability samples from the unaltered data distribution. We then identify key architecture-, data augmentation-, and optimization tactics for improving neural network robustness. This causal view of robustness reveals that common practices in the current literature, both in regards to robustness tactics and evaluations, correspond to causal concepts, such as soft interventions resulting in a counterfactually-altered distribution of imaging conditions. Through our findings and analysis, we offer perspectives on how future research may mind this evident and significant non-adversarial robustness gap.
Proximal Causal Inference with Hidden Mediators: Front-Door and Related Mediation Problems
Ghassami, AmirEmad, Shpitser, Ilya, Tchetgen, Eric Tchetgen
Proximal causal inference was recently proposed as a framework to identify causal effects from observational data in the presence of hidden confounders for which proxies are available. In this paper, we extend the proximal causal approach to settings where identification of causal effects hinges upon a set of mediators which unfortunately are not directly observed, however proxies of the hidden mediators are measured. Specifically, we establish (i) a new hidden front-door criterion which extends the classical front-door result to allow for hidden mediators for which proxies are available; (ii) We extend causal mediation analysis to identify direct and indirect causal effects under unconfoundedness conditions in a setting where the mediator in view is hidden, but error prone proxies of the latter are available. We view (i) and (ii) as important steps towards the practical application of front-door criteria and mediation analysis as mediators are almost always error prone and thus, the most one can hope for in practice is that our measurements are at best proxies of mediating mechanisms. Finally, we show that identification of certain causal effects remains possible even in settings where challenges in (i) and (ii) might co-exist.