Missing Value Knockoffs
Coping with increasing number of variables, optimizing predictive performance, and selecting among candidate scientific hypothesis are all valid reasons for using a variable selection algorithm. Another reality of today's datasets are missing values. Although there are existing methods for handling the missing values if applied directly, they can interfere with the assumptions of variable selection algorithms. In this work, we will discuss how model-x knockoffs (Candes et al. 2017), a new approach in principled variable selection, can be applied to datasets that contain missing values. By principled variable selection we refer to algorithms that aims to identify the Markov Blanket (MB) of a response variable (Tsamardinos and Aliferis 2003) while providing a control of the false selections. Identifying the MB is by definition optimal as the MB refers to the smallest subset of variables that is sufficient to describe the conditional distribution of the response variable. Controlling the false selections refers to limiting the variables that are selected due to random chance and is especially important in applications where a selected variable corresponds to a scientific discovery. Model-x knockoffs provides a framework for repurposing existing statistical/machine learning feature scorers for MB discovery. When the assumptions of the model-x framework holds, the expected fraction of selections that are conditionally pairwise independent with the response variable is controlled.
Feb-25-2022
- Country:
- North America > United States
- New York > Rensselaer County
- Troy (0.04)
- California > San Francisco County
- San Francisco (0.14)
- New York > Rensselaer County
- Europe > United Kingdom
- England > Oxfordshire > Oxford (0.04)
- North America > United States
- Genre:
- Research Report > New Finding (0.68)
- Industry: