Unsupervised or Indirectly Supervised Learning
Review for NeurIPS paper: Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
Additional Feedback: Overall comment: Although I enjoyed reading the paper and it proposes novel ideas for PU learning research, I couldn't give a high score because: I feel it is hard to compare between methods in the experiments due to the usage of different models for proposed/baselines, some of the work in this paper (Sec. Other comments: The output of logistic classifiers will be between 0 and 1, and theoretically it should be an estimate of p(y x). Practically, the estimate of p(y x) can become quite noisy, or may overfit and lead to peaky hat{p}(y x) distributions, according to papers like "On Calibration of Modern Neural Networks" (ICML 2017). Assuming \hat{\sigma}(x) p_tr(y -1 x) seems to be a strong assumption, but does this cause any issues in the experiments? A minor suggestion is to investigate confidence-calibration, and see how much sensitive the final PU classifier is for worse calibration.
Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
Positive-unlabeled (PU) learning trains a binary classifier using only positive and unlabeled data. A common simplifying assumption is that the positive data is representative of the target positive class. This assumption rarely holds in practice due to temporal drift, domain shift, and/or adversarial manipulation. This paper shows that PU learning is possible even with arbitrarily non-representative positive data given unlabeled data from the source and target distributions. Our key insight is that only the negative class's distribution need be fixed. We integrate this into two statistically consistent methods to address arbitrary positive bias - one approach combines negative-unlabeled learning with unlabeled-unlabeled learning while the other uses a novel, recursive risk estimator. Experimental results demonstrate our methods' effectiveness across numerous real-world datasets and forms of positive bias, including disjoint positive class-conditional supports. Additionally, we propose a general, simplified approach to address PU risk estimation overfitting.
Review for NeurIPS paper: Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
Following the author response and discussion, all reviewers had an overall positive impression of the paper, highlighting some salient features: studies an interesting and under-explored problem setting, namely, PU learning where the positive samples are from a distribution unrelated to that of the target distribution the proposed method is equipped with theoretical guarantees, and is demonstrated to perform well empirically Some areas for improvement include: - the lack of comparison against bPU. The argument of such techniques making an assumption that does not hold is fine, but how well do they perform on the tasks considered here? The authors are encouraged to incorporate these in a revised version.
Reviews: Are Labels Required for Improving Adversarial Robustness?
This paper combines the two and conducts adversarial training by applying the regularization term on unlabeled data. The theoretical and empirical analysis are new. However, some crucial points are not well addressed. It is observed that the proposed method (UAT) behaves differently on CIFAR10 and SVHN (Fig.1), wrt m. On CIFAR10, it outperforms others starting from m 4k.
Are Labels Required for Improving Adversarial Robustness?
Jean-Baptiste Alayrac, Jonathan Uesato, Po-Sen Huang, Alhussein Fawzi, Robert Stanforth, Pushmeet Kohli
Recent work has uncovered the interesting (and somewhat surprising) finding that training models to be invariant to adversarial perturbations requires substantially larger datasets than those required for standard classification. This result is a key hurdle in the deployment of robust machine learning models in many real world applications where labeled data is expensive. Our main insight is that unlabeled data can be a competitive alternative to labeled data for training adversarially robust models. Theoretically, we show that in a simple statistical setting, the sample complexity for learning an adversarially robust model from unlabeled data matches the fully supervised case up to constant factors. On standard datasets like CIFAR-10, a simple Unsupervised Adversarial Training (UAT) approach using unlabeled data improves robust accuracy by 21.7% over using 4K supervised examples alone, and captures over 95% of the improvement from the same number of labeled examples. Finally, we report an improvement of 4% over the previous state-of-theart on CIFAR-10 against the strongest known attack by using additional unlabeled data from the uncurated 80 Million Tiny Images dataset. This demonstrates that our finding extends as well to the more realistic case where unlabeled data is also uncurated, therefore opening a new avenue for improving adversarial training.
Review for NeurIPS paper: Uncertainty Aware Semi-Supervised Learning on Graph Data
Clarity: Overall the paper is very clear. The authors did an excellent job. Equation 5 - I am confused on a few things. The notation P(y x; theta) is confusing because the semicolon implies that theta is a vector and not a random vector, however, the conditional distribution of theta is given P(theta G). So what is the point of the semicolon? Also, there is a typo in Equation 5 I think because the entropy term is not defined correctly.
Review for NeurIPS paper: Uncertainty Aware Semi-Supervised Learning on Graph Data
R#2 and R#3 generally liked the paper. R#1 has a brief review that raised concern on novelty of the method. The rebuttal well addressed the concerns and made all reviewers increase their score. We have collected comments from an additional reviewer, who pointed out more issues on writing and the theoretical results (see blew). We advise the authors to take efforts to address these issues in the revision.
Not All Out-of-Distribution Data Are Harmful to Open-Set Active Learning Yang Yang Nanjing University of Science and Technology
Active learning (AL) methods have been proven to be an effective way to reduce the labeling effort by intelligently selecting valuable instances for annotation. Despite their great success with in-distribution (ID) scenarios, AL methods suffer from performance degradation in many real-world applications because out-of-distribution (OOD) instances are always inevitably contained in unlabeled data, which may lead to inefficient sampling. Therefore, several attempts have been explored open-set AL by strategically selecting pure ID instances while filtering OOD instances. However, concentrating solely on selecting pseudo-ID instances may cause the training constraint of the ID classifier and OOD detector. To address this issue, we propose a simple yet effective sampling scheme, Progressive Active Learning (PAL), which employs a progressive sampling mechanism to leverage the active selection of valuable OOD instances. The proposed PAL measures unlabeled instances by synergistically evaluating instances' informativeness and representativeness, and thus it can balance the pseudo-ID and pseudo-OOD instances in each round to enhance both the capacity of the ID classifier and the OOD detector. Extensive experiments on various open-set AL scenarios demonstrate the effectiveness of the proposed PAL, compared with the state-of-the-art methods. The code is available at https://github.com/njustkmg/PAL.
Not All Out-of-Distribution Data Are Harmful to Open-Set Active Learning Yang Yang Nanjing University of Science and Technology
Active learning (AL) methods have been proven to be an effective way to reduce the labeling effort by intelligently selecting valuable instances for annotation. Despite their great success with in-distribution (ID) scenarios, AL methods suffer from performance degradation in many real-world applications because out-of-distribution (OOD) instances are always inevitably contained in unlabeled data, which may lead to inefficient sampling. Therefore, several attempts have been explored open-set AL by strategically selecting pure ID instances while filtering OOD instances. However, concentrating solely on selecting pseudo-ID instances may cause the training constraint of the ID classifier and OOD detector. To address this issue, we propose a simple yet effective sampling scheme, Progressive Active Learning (PAL), which employs a progressive sampling mechanism to leverage the active selection of valuable OOD instances. The proposed PAL measures unlabeled instances by synergistically evaluating instances' informativeness and representativeness, and thus it can balance the pseudo-ID and pseudo-OOD instances in each round to enhance both the capacity of the ID classifier and the OOD detector. Extensive experiments on various open-set AL scenarios demonstrate the effectiveness of the proposed PAL, compared with the state-of-the-art methods. The code is available at https://github.com/njustkmg/PAL.
Reviews: Leveraging Labeled and Unlabeled Data for Consistent Fair Binary Classification
Originality: -Although the Hardt paper has suggested the use of this approach, the paper claims that it's the first to actually show this is indeed possible. Quality: -The assumptions made about the model are very well justified. The discussion after each assumption provided the context as to why the assumption makes sense and why the assumption is needed to study their model. These discussion as a result provided very good intuition and set up the stage for the proof. Clarity: -Overall, the paper have a very smooth flow, whether it be discussion of their assumptions or their remarks.