Improving realistic semi-supervised learning with doubly robust estimation

Pham, Khiem, Herrmann, Charles, Zabih, Ramin

Jan-31-2025–arXiv.org Machine Learning

A major challenge in Semi-Supervised Learning (SSL) is the limited information available about the class distribution in the unlabeled data. In many real-world applications this arises from the prevalence of long-tailed distributions, where the standard pseudo-label approach to SSL is biased towards the labeled class distribution and thus performs poorly on unlabeled data. Existing methods typically assume that the unlabeled class distribution is either known a priori, which is unrealistic in most situations, or estimate it on-the-fly using the pseudo-labels themselves. We propose to explicitly estimate the unlabeled class distribution, which is a finite-dimensional parameter, \emph{as an initial step}, using a doubly robust estimator with a strong theoretical guarantee; this estimate can then be integrated into existing methods to pseudo-label the unlabeled data during training more accurately. Experimental results demonstrate that incorporating our techniques into common pseudo-labeling approaches improves their performance.

artificial intelligence, deep learning, machine learning, (16 more...)

arXiv.org Machine Learning

Jan-31-2025

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Ontario > Toronto (0.14)
- Europe > United Kingdom
  - England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East
  - Jordan (0.04)

Genre:
- Research Report (0.70)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Unsupervised or Indirectly Supervised Learning (1.00)
  - Statistical Learning (0.93)
  - Neural Networks > Deep Learning (0.46)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found