Review for NeurIPS paper: Debiasing Averaged Stochastic Gradient Descent to handle missing values

Neural Information Processing Systems 

A key quantity in the analysis is p_j, the probability of missing j-th feature. Is this quantity given in advanced? Under MCAR, it can be easily estimated. But some clarifications are needed. Essentially, this method is just SGD IPW (though it is coordinate-wise IPW not the regular IPW).