estimation and imputation
Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data
Missing Not At Random (MNAR) values where the probability of having missing data may depend on the missing value itself, are notoriously difficult to account for in analyses, although very frequent in the data. One solution to handle MNAR data is to specify a model for the missing data mechanism, which makes inference or imputation tasks more complex. Furthermore, this implies a strong \textit{a priori} on the parametric form of the distribution. However, some works have obtained guarantees on the estimation of parameters in the presence of MNAR data, without specifying the distribution of missing data \citep{mohan2018estimation, tang2003analysis}. This is very useful in practice, but is limited to simple cases such as few self-masked MNAR variables in data generated according to linear regression models.
Review for NeurIPS paper: Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data
Additional Feedback: I have read the other reviews and the authors' feedback. With the addition of the recommender system experiment, walking the readers through how (1) the MNAR and PPCA model apply in this setting, (2) selecting the hyper-parameters for the imputation algorithm, (3) showing how the imputations compare with prior algorithms, helps make a strong case for the proposed method. If the authors re-arrange the paper to improve clarity (as the reviews point out, and as they promise in their feedback), the paper can be substantially stronger. There are a few lingering questions from the reviews that the authors should address in the paper at a minimum -- (1) a discussion on a stage-wise approach to imputation (and why that may not be necessary for their sequence of regressions), (2) given that some of the linear coefficients can be zero, what must a practitioner do when one of the regressions estimate a coefficient close to 0 that is then used in the denominator of other estimates. Even better if the illustration is grounded in an example like movie item ratings.
Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data
Missing Not At Random (MNAR) values where the probability of having missing data may depend on the missing value itself, are notoriously difficult to account for in analyses, although very frequent in the data. One solution to handle MNAR data is to specify a model for the missing data mechanism, which makes inference or imputation tasks more complex. Furthermore, this implies a strong \textit{a priori} on the parametric form of the distribution. However, some works have obtained guarantees on the estimation of parameters in the presence of MNAR data, without specifying the distribution of missing data \citep{mohan2018estimation, tang2003analysis}. This is very useful in practice, but is limited to simple cases such as few self-masked MNAR variables in data generated according to linear regression models.