Goto

Collaborating Authors

 study variable


Gaussian Copula Models for Nonignorable Missing Data Using Auxiliary Marginal Quantiles

Feldman, Joseph, Reiter, Jerome P., Kowal, Daniel R.

arXiv.org Machine Learning

We present an approach for modeling and imputation of nonignorable missing data under Gaussian copulas. The analyst posits a set of quantiles of the marginal distributions of the study variables, for example, reflecting information from external data sources or elicited expert opinion. When these quantiles are accurately specified, we prove it is possible to consistently estimate the copula correlation and perform multiple imputation in the presence of nonignorable missing data. We develop algorithms for estimation and imputation that are computationally efficient, which we evaluate in simulation studies of multiple imputation inferences. We apply the model to analyze associations between lead exposure levels and end-of-grade test scores for 170,000 students in North Carolina. These measurements are not missing at random, as children deemed at-risk for high lead exposure are more likely to be measured. We construct plausible marginal quantiles for lead exposure using national statistics provided by the Centers for Disease Control and Prevention. Complete cases and missing at random analyses appear to underestimate the relationships between certain variables and end-of-grade test scores, while multiple imputation inferences under our model support stronger adverse associations between lead exposure and educational outcomes.


Asymptotically Unbiased Off-Policy Policy Evaluation when Reusing Old Data in Nonstationary Environments

Liu, Vincent, Chandak, Yash, Thomas, Philip, White, Martha

arXiv.org Artificial Intelligence

In this work, we consider the off-policy policy evaluation problem for contextual bandits and finite horizon reinforcement learning in the nonstationary setting. Reusing old data is critical for policy evaluation, but existing estimators that reuse old data introduce large bias such that we can not obtain a valid confidence interval. Inspired from a related field called survey sampling, we introduce a variant of the doubly robust (DR) estimator, called the regression-assisted DR estimator, that can incorporate the past data without introducing a large bias. The estimator unifies several existing off-policy policy evaluation methods and improves on them with the use of auxiliary information and a regression approach. We prove that the new estimator is asymptotically unbiased, and provide a consistent variance estimator to a construct a large sample confidence interval. Finally, we empirically show that the new estimator improves estimation for the current and future policy values, and provides a tight and valid interval estimation in several nonstationary recommendation environments.


A new nonparametric interpoint distance-based measure for assessment of clustering

Modak, Soumita

arXiv.org Artificial Intelligence

A new interpoint distance-based measure is proposed to identify the optimal number of clusters present in a data set. Designed in nonparametric approach, it is independent of the distribution of given data. Interpoint distances between the data members make our cluster validity index applicable to univariate and multivariate data measured on arbitrary scales, or having observations in any dimensional space where the number of study variables can be even larger than the sample size. Our proposed criterion is compatible with any clustering algorithm, and can be used to determine the unknown number of clusters or to assess the quality of the resulting clusters for a data set. Demonstration through synthetic and real-life data establishes its superiority over the well-known clustering accuracy measures of the literature.