Goto

Collaborating Authors

 Unsupervised or Indirectly Supervised Learning


Graph-based Semi-Supervised Learning via Maximum Discrimination

arXiv.org Machine Learning

Semi-supervised learning (SSL) addresses the critical challenge of training accurate models when labeled data is scarce but unlabeled data is abundant. Graph-based SSL (GSSL) has emerged as a popular framework that captures data structure through graph representations. Classic graph SSL methods, such as Label Propagation and Label Spreading, aim to compute low-dimensional representations where points with the same labels are close in representation space. Although often effective, these methods can be suboptimal on data with complex label distributions. In our work, we develop AUC-spec, a graph approach that computes a low-dimensional representation that maximizes class separation. We compute this representation by optimizing the Area Under the ROC Curve (AUC) as estimated via the labeled points. We provide a detailed analysis of our approach under a product-of-manifold model, and show that the required number of labeled points for AUC-spec is polynomial in the model parameters. Empirically, we show that AUC-spec balances class separation with graph smoothness. It demonstrates competitive results on synthetic and real-world datasets while maintaining computational efficiency comparable to the field's classic and state-of-the-art methods.




We sincerely appreciate insightful comments and positive feedback from the reviewers: important problem (R1

Neural Information Processing Systems

We respond to each comment one by one. We mention this in Line 148; however, we will make it clear in the final draft. Conversely, SSL algorithms use the unlabeled data but they do not consider the class imbalance. We will make this point clear in the final draft. However, to avoid the confusion, we will substitute X,Y to ฮฑ,ฮฒ in the final draft.


SpatialEnsemble:aNovelModelSmoothing MechanismforStudent-Teacher Framework

Neural Information Processing Systems

Second, TMA constrains the variance of the teachers to be small to avoid inconsistent labels produced during twoadjacent updates.


DP-SSL: TowardsRobustSemi-supervisedLearning withAFewLabeledSamples

Neural Information Processing Systems

However, when the size of labeled data is very small (say a few labeled samples per class), SSL performs poorly and unstably, possibly due to the low qualityoflearnedpseudolabels.Inthispaper,weproposeanewSSLmethodcalled DP-SSL that adopts an innovative data programming (DP) scheme to generate probabilistic labels for unlabeled data. Different from existing DP methods that rely on human experts to provide initial labeling functions (LFs), we develop a multiple-choice learning (MCL) based approach to automatically generate LFs fromscratchinSSLstyle. Withthenoisylabelsproduced bytheLFs,wedesign a label model to resolve the conflict and overlap among the noisy labels, and finally infer probabilistic labels for unlabeled samples.