Identification and Estimation for Nonignorable Missing Data: A Data Fusion Approach

Wang, Zixiao, Ghassami, AmirEmad, Shpitser, Ilya

arXiv.org Artificial Intelligence 

Missing data is a pervasive and challenging issue in various applications of statistical inference, such as healthcare, economics, and the social sciences. Data are said to be Missing at Random (MAR) when the mechanism of missingness depends only on the observed data. Strategies to deal with MAR have been extensively investigated in the literature (Dempster et al., 1977; Robins et al., 1994; Tsiatis, 2006; Little and Rubin, 2019). In many practical settings, MAR is not a realistic assumption. Instead, missingness often depends on variables that are themselves missing. Such settings are said to exhibit nonignorable missingness, with the resulting data being Missing Not at Random (MNAR) (Fielding et al., 2008; Schafer and Graham, 2002), A classic example of a scenario with MNAR data occurs in longitudinal studies, due to the treatment's toxicity, some patients may become too ill to visit the clinic, leading to the situation where the outcome of certain patients with circumstances associated with those outcomes are more likely to be lost to follow-up (Ibrahim et al., 2012). Previous MNAR models have typically imposed constraints on the target distribution and its missingness mechanism, ensuring the parameter of interest can be identified. This approach goes back to the work of Heckman (1979), who proposed an outcome-selection model based on parametric modeling of the outcome variable and missing pattern. Little (1993) introduced the pattern-mixture model where one needs to specify the distribution for each missing data pattern independently.