A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

Neural Information Processing Systems 

We consider the situation in semi-supervised learning, where the "label sampling" mechanism stochastically depends on the true response (as well as potentially on the features). We suggest a method of moments for estimating this stochastic dependence using the unlabeled data. This is potentially useful for two distinct purposes: a. As an input to a super- vised learning procedure which can be used to "de-bias" its results using labeled data only and b. We present several examples to illustrate the practical usefulness of our method.