Goto

Collaborating Authors

 imbalanced semi-supervised learning


Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

Neural Information Processing Systems

While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo-labels of unlabeled data toward majority classes. To alleviate this issue, we formulate a convex optimization problem to softly refine the pseudo-labels generated from the biased model, and develop a simple algorithm, named Distribution Aligning Refinery of Pseudo-label (DARP) that solves it provably and efficiently. Under various class imbalanced semi-supervised scenarios, we demonstrate the effectiveness of DARP and its compatibility with state-of-the-art SSL schemes.


Review for NeurIPS paper: Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

Neural Information Processing Systems

Summary and Contributions: Distribution Aligning Refinery of Pseudo-label (DARP) For semi-supervised learning (SSL), DARP is proposed to match the pseudo-labels with the underlying class distribution of the unlabeled data. The objective function is to minimize the KL divergence of the "aligned" pseudo-labels with the original pseudo-labels subject to the constraints that the "aligned" pseudo-labels are consistent with desired class/label distribution for the unlabeled data. To speed up the process, DARP uses a coordinate ascent algorithm for the Largrangian dual of the objective function. The evaluation was conducted with the CIFAR10 dataset with various artificially degrees of imbalance. DARP was used with a few existing algorithms for imbalanced SSL.


Review for NeurIPS paper: Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

Neural Information Processing Systems

This paper proposes an approach to semi-supervised learning for imbalanced classes. It is indeed non-trivial to combine local/global/perturbation consistency-based semi-supervised methods and fully supervised methods for imbalanced classes---this paper may be the first work along this direction. The paper is quite general and can be applied on top of any pseudo-labeling-based semi-supervised methods. It first estimates the true class-prior probability and then updates/modifies the pseudo labels by pushing their class-prior probability with a constrained convex optimization. While in the beginning the reviewers had some concerns (mainly the clarity and too few datasets), the authors did a particularly good job in their rebuttal (showing that the class-prior probability can be estimated rather than must be given).


Distribution Aligning Refinery of Pseudo-label for Imbalanced Semi-supervised Learning

Neural Information Processing Systems

While semi-supervised learning (SSL) has proven to be a promising way for leveraging unlabeled data when labeled data is scarce, the existing SSL algorithms typically assume that training class distributions are balanced. However, these SSL algorithms trained under imbalanced class distributions can severely suffer when generalizing to a balanced testing criterion, since they utilize biased pseudo-labels of unlabeled data toward majority classes. To alleviate this issue, we formulate a convex optimization problem to softly refine the pseudo-labels generated from the biased model, and develop a simple algorithm, named Distribution Aligning Refinery of Pseudo-label (DARP) that solves it provably and efficiently. Under various class imbalanced semi-supervised scenarios, we demonstrate the effectiveness of DARP and its compatibility with state-of-the-art SSL schemes.


BaCon: Boosting Imbalanced Semi-supervised Learning via Balanced Feature-Level Contrastive Learning

arXiv.org Artificial Intelligence

Semi-supervised Learning (SSL) reduces the need for extensive annotations in deep learning, but the more realistic challenge of imbalanced data distribution in SSL remains largely unexplored. In Class Imbalanced Semi-supervised Learning (CISSL), the bias introduced by unreliable pseudo-labels can be exacerbated by imbalanced data distributions. Most existing methods address this issue at instance-level through reweighting or resampling, but the performance is heavily limited by their reliance on biased backbone representation. Some other methods do perform feature-level adjustments like feature blending but might introduce unfavorable noise. In this paper, we discuss the bonus of a more balanced feature distribution for the CISSL problem, and further propose a Balanced Feature-Level Contrastive Learning method (BaCon). Our method directly regularizes the distribution of instances' representations in a well-designed contrastive manner. Specifically, class-wise feature centers are computed as the positive anchors, while negative anchors are selected by a straightforward yet effective mechanism. A distribution-related temperature adjustment is leveraged to control the class-wise contrastive degrees dynamically. Our method demonstrates its effectiveness through comprehensive experiments on the CIFAR10-LT, CIFAR100-LT, STL10-LT, and SVHN-LT datasets across various settings. For example, BaCon surpasses instance-level method FixMatch-based ABC on CIFAR10-LT with a 1.21% accuracy improvement, and outperforms state-of-the-art feature-level method CoSSL on CIFAR100-LT with a 0.63% accuracy improvement. When encountering more extreme imbalance degree, BaCon also shows better robustness than other methods.


Align, Distill, and Augment Everything All at Once for Imbalanced Semi-Supervised Learning

arXiv.org Artificial Intelligence

Addressing the class imbalance in long-tailed semi-supervised learning (SSL) poses a few significant challenges stemming from differences between the marginal distributions of unlabeled data and the labeled data, as the former is often unknown and potentially distinct from the latter. The first challenge is to avoid biasing the pseudo-labels towards an incorrect distribution, such as that of the labeled data or a balanced distribution, during training. However, we still wish to ensure a balanced unlabeled distribution during inference, which is the second challenge. To address both of these challenges, we propose a three-faceted solution: a flexible distribution alignment that progressively aligns the classifier from a dynamically estimated unlabeled prior towards a balanced distribution, a soft consistency regularization that exploits underconfident pseudo-labels discarded by threshold-based methods, and a schema for expanding the unlabeled set with input data from the labeled partition. This last facet comes in as a response to the commonly-overlooked fact that disjoint partitions of labeled and unlabeled data prevent the benefits of strong data augmentation on the labeled set. Our overall framework requires no additional training cycles, so it will align, distill, and augment everything all at once (ADALLO). Our extensive evaluations of ADALLO on imbalanced SSL benchmark datasets, including CIFAR10-LT, CIFAR100-LT, and STL10-LT with varying degrees of class imbalance, amount of labeled data, and distribution mismatch, demonstrate significant improvements in the performance of imbalanced SSL under large distribution mismatch, as well as competitiveness with state-of-the-art methods when the labeled and unlabeled data follow the same marginal distribution. Our code will be released upon paper acceptance.