Inductive Learning
Prompt-Driven Feature Diffusion for Open-World Semi-Supervised Learning
Heidari, Marzi, Zhang, Hanping, Guo, Yuhong
In this paper, we present a novel approach termed Prompt-Driven Feature Diffusion (PDFD) within a semi-supervised learning framework for Open World Semi-Supervised Learning (OW-SSL). At its core, PDFD deploys an efficient feature-level diffusion model with the guidance of class-specific prompts to support discriminative feature representation learning and feature generation, tackling the challenge of the non-availability of labeled data for unseen classes in OW-SSL. In particular, PDFD utilizes class prototypes as prompts in the diffusion model, leveraging their class-discriminative and semantic generalization ability to condition and guide the diffusion process across all the seen and unseen classes. Furthermore, PDFD incorporates a class-conditional adversarial loss for diffusion model training, ensuring that the features generated via the diffusion process can be discriminatively aligned with the class-conditional features of the real data. Additionally, the class prototypes of the unseen classes are computed using only unlabeled instances with confident predictions within a semi-supervised learning framework. We conduct extensive experiments to evaluate the proposed PDFD. The empirical results show PDFD exhibits remarkable performance enhancements over many state-of-the-art existing methods.
On Training Data Influence of GPT Models
Liu, Qingyi, Chai, Yekun, Wang, Shuohuan, Sun, Yu, Peng, Qiwei, Wang, Keze, Wu, Hua
Amidst the rapid advancements in generative language models, the investigation of how training data shapes the performance of GPT models is still emerging. This paper presents GPTfluence, a novel approach that leverages a featurized simulation to assess the impact of training examples on the training dynamics of GPT models. Our approach not only traces the influence of individual training instances on performance trajectories, such as loss and other key metrics, on targeted test points but also enables a comprehensive comparison with existing methods across various training scenarios in GPT models, ranging from 14 million to 2.8 billion parameters, across a range of downstream tasks. Contrary to earlier methods that struggle with generalization to new data, GPTfluence introduces a parameterized simulation of training dynamics, demonstrating robust generalization capabilities to unseen training data. This adaptability is evident across both fine-tuning and instruction-tuning scenarios, spanning tasks in natural language understanding and generation. We will make our code and data publicly available.
Integration of Self-Supervised BYOL in Semi-Supervised Medical Image Recognition
Feng, Hao, Jia, Yuanzhe, Xu, Ruijia, Prasad, Mukesh, Anaissi, Ali, Braytee, Ali
Image recognition techniques heavily rely on abundant labeled data, particularly in medical contexts. Addressing the challenges associated with obtaining labeled data has led to the prominence of self-supervised learning and semi-supervised learning, especially in scenarios with limited annotated data. In this paper, we proposed an innovative approach by integrating self-supervised learning into semi-supervised models to enhance medical image recognition. Our methodology commences with pre-training on unlabeled data utilizing the BYOL method. Subsequently, we merge pseudo-labeled and labeled datasets to construct a neural network classifier, refining it through iterative fine-tuning. Experimental results on three different datasets demonstrate that our approach optimally leverages unlabeled data, outperforming existing methods in terms of accuracy for medical image recognition.
Self-Supervised Learning Featuring Small-Scale Image Dataset for Treatable Retinal Diseases Classification
Huang, Luffina C., Chiu, Darren J., Mehta, Manish
Automated medical diagnosis through image-based neural networks has increased in popularity and matured over years. Nevertheless, it is confined by the scarcity of medical images and the expensive labor annotation costs. Self-Supervised Learning (SSL) is an good alternative to Transfer Learning (TL) and is suitable for imbalanced image datasets. In this study, we assess four pretrained SSL models and two TL models in treatable retinal diseases classification using small-scale Optical Coherence Tomography (OCT) images ranging from 125 to 4000 with balanced or imbalanced distribution for training. The proposed SSL model achieves the state-of-art accuracy of 98.84% using only 4,000 training images. Our results suggest the SSL models provide superior performance under both the balanced and imbalanced training scenarios. The SSL model with MoCo-v2 scheme has consistent good performance under the imbalanced scenario and, especially, surpasses the other models when the training set is less than 500 images.
Can We Break Free from Strong Data Augmentations in Self-Supervised Learning?
Gowda, Shruthi, Arani, Elahe, Zonooz, Bahram
Self-supervised learning (SSL) has emerged as a promising solution for addressing the challenge of limited labeled data in deep neural networks (DNNs), offering scalability potential. However, the impact of design dependencies within the SSL framework remains insufficiently investigated. In this study, we comprehensively explore SSL behavior across a spectrum of augmentations, revealing their crucial role in shaping SSL model performance and learning mechanisms. Leveraging these insights, we propose a novel learning approach that integrates prior knowledge, with the aim of curtailing the need for extensive data augmentations and thereby amplifying the efficacy of learned representations. Notably, our findings underscore that SSL models imbued with prior knowledge exhibit reduced texture bias, diminished reliance on shortcuts and augmentations, and improved robustness against both natural and adversarial corruptions. These findings not only illuminate a new direction in SSL research, but also pave the way for enhancing DNN performance while concurrently alleviating the imperative for intensive data augmentation, thereby enhancing scalability and realworld problem-solving capabilities. Deep neural networks (DNNs) have proven to be highly effective in encoding patterns in data distribution to produce powerful and rich representations that have improved generalization performance across various perception tasks, such as classification, detection, and segmentation. However, one of the major limitations is that DNNs are data-hungry and annotating millions of available data is expensive. Self-supervised learning (SSL) has been proposed as a promising solution to this issue, to enable the learning of useful representations without manual annotations. Self-supervised learning paradigm needs to ensure that the resulting features are generic to be applicable to a wide range of real-world applications. Various SSL methods, including pretext-based (Gidaris Figure 1: The impact of augmentations on SSL methods is et al., 2018; Noroozi & Favaro, 2016), critical: as removing strong augmentations from SSL training contrastive-based (Chen et al., 2020a; He et al., can result in a significant drop in their performance.
Incremental Self-training for Semi-supervised Learning
Guo, Jifeng, Liu, Zhulin, Zhang, Tong, Chen, C. L. Philip
Semi-supervised learning provides a solution to reduce the dependency of machine learning on labeled data. As one of the efficient semi-supervised techniques, self-training (ST) has received increasing attention. Several advancements have emerged to address challenges associated with noisy pseudo-labels. Previous works on self-training acknowledge the importance of unlabeled data but have not delved into their efficient utilization, nor have they paid attention to the problem of high time consumption caused by iterative learning. This paper proposes Incremental Self-training (IST) for semi-supervised learning to fill these gaps. Unlike ST, which processes all data indiscriminately, IST processes data in batches and priority assigns pseudo-labels to unlabeled samples with high certainty. Then, it processes the data around the decision boundary after the model is stabilized, enhancing classifier performance. Our IST is simple yet effective and fits existing self-training-based semi-supervised learning methods. We verify the proposed IST on five datasets and two types of backbone, effectively improving the recognition accuracy and learning speed. Significantly, it outperforms state-of-the-art competitors on three challenging image classification tasks.
An Experimental Comparison Of Multi-view Self-supervised Methods For Music Tagging
Meseguer-Brocal, Gabriel, Desblancs, Dorian, Hennequin, Romain
Self-supervised learning has emerged as a powerful way to pre-train generalizable machine learning models on large amounts of unlabeled data. It is particularly compelling in the music domain, where obtaining labeled data is time-consuming, error-prone, and ambiguous. During the self-supervised process, models are trained on pretext tasks, with the primary objective of acquiring robust and informative features that can later be fine-tuned for specific downstream tasks. The choice of the pretext task is critical as it guides the model to shape the feature space with meaningful constraints for information encoding. In the context of music, most works have relied on contrastive learning or masking techniques. In this study, we expand the scope of pretext tasks applied to music by investigating and comparing the performance of new self-supervised methods for music tagging. We open-source a simple ResNet model trained on a diverse catalog of millions of tracks. Our results demonstrate that, although most of these pre-training methods result in similar downstream results, contrastive learning consistently results in better downstream performance compared to other self-supervised pre-training methods. This holds true in a limited-data downstream context.
WikiSplit++: Easy Data Refinement for Split and Rephrase
Tsukagoshi, Hayato, Hirao, Tsutomu, Morishita, Makoto, Chousa, Katsuki, Sasano, Ryohei, Takeda, Koichi
The task of Split and Rephrase, which splits a complex sentence into multiple simple sentences with the same meaning, improves readability and enhances the performance of downstream tasks in natural language processing (NLP). However, while Split and Rephrase can be improved using a text-to-text generation approach that applies encoder-decoder models fine-tuned with a large-scale dataset, it still suffers from hallucinations and under-splitting. To address these issues, this paper presents a simple and strong data refinement approach. Here, we create WikiSplit++ by removing instances in WikiSplit where complex sentences do not entail at least one of the simpler sentences and reversing the order of reference simple sentences. Experimental results show that training with WikiSplit++ leads to better performance than training with WikiSplit, even with fewer training instances. In particular, our approach yields significant gains in the number of splits and the entailment ratio, a proxy for measuring hallucinations.
Stability and Generalization in Free Adversarial Training
Cheng, Xiwei, Fu, Kexin, Farnia, Farzan
While deep neural networks (DNNs) have led to remarkable results in standard supervised learning tasks in computer vision and natural language processing, they are widely recognized to be susceptible to minor adversarially-designed perturbations to their input data commonly regarded as adversarial attacks [1, 2]. Adversarial examples are typically designed by finding the worst-case norm-constrained perturbation that leads to the maximum impact on the classification loss at an input data point. To combat norm-bounded adversarial attacks, adversarial training (AT) methods [3] which learn a DNN classifier using adversarially-perturbed training examples have been shown to significantly improve the robustness of a DNN against norm-bounded adversarial attacks. Several variants of AT methods have been developed in the machine learning community to accelerate and facilitate the application of AT algorithms to large-scale machine learning problems [4, 5]. While AT algorithms have achieved state-of-the-art robustness scores against standard norm-bounded adversarial attacks, the generalization gap between their performance on training and test data has been frequently observed to be significantly greater than the generalization error of DNNs learned by standard empirical risk minimization (ERM) [6, 7]. To understand the significant generalization gap in adversarial training, several theoretical and empirical studies have focused on the generalization properties of adversariallytrained models [8,9]. These studies have attempted to analyze the generalization error in learning adversariallyrobust models and reduce the generalization gap by applying explicit and implicit regularization techniques such as early stopping and Lipschitz regularization methods. Specifically, several recent works [10-12] have focused on the connections between the optimization and generalization behavior of adversarially-trained models.
Enhancing Fairness and Performance in Machine Learning Models: A Multi-Task Learning Approach with Monte-Carlo Dropout and Pareto Optimality
The term bias was first introduced in the machine learning domain by Tom Mitchell in his 1980 paper titled "The need for biases in learning generalizations" Mitchell [1980]. The concept of bias refers to giving importance to particular features to improve generalization. This general idea of bias in machine learning is positive and necessary for models to perform, eliminating the risk of hyper-focusing on specific samples over others. On the contrary, bias can also be negative in machine learning. Negative bias can be defined as an inaccurate assumption made by a machine learning algorithm that is systematically or historically prejudiced against certain groups of people Zanna et al. [2022]. Decisions made by these biased algorithms could cause adverse effects on particular social groups, for example, those defined by sex, race, age, marital status, handicaps, etc., when used to make autonomous decisions in life-changing cases such as health, hiring, education, criminal sentencing, etc. Negative bias can be introduced into the machine pipeline in two main ways, through the data or the algorithm itself Blanzeisky and Cunningham [2021]. Bias due to data, also known as a negative legacy Cunningham and Delany [2021], Kamishima et al. [2012], can be caused by an imbalance in the representation of different population categories