Unsupervised or Indirectly Supervised Learning
Robust Unsupervised Multi-task and Transfer Learning on Gaussian Mixture Models
Tian, Ye, Weng, Haolei, Feng, Yang
Unsupervised learning has been widely used in many real-world applications. One of the simplest and most important unsupervised learning models is the Gaussian mixture model (GMM). In this work, we study the multi-task learning problem on GMMs, which aims to leverage potentially similar GMM parameter structures among tasks to obtain improved learning performance compared to single-task learning. We propose a multi-task GMM learning procedure based on the EM algorithm that not only can effectively utilize unknown similarity between related tasks but is also robust against a fraction of outlier tasks from arbitrary distributions. The proposed procedure is shown to achieve minimax optimal rate of convergence for both parameter estimation error and the excess mis-clustering error, in a wide range of regimes. Moreover, we generalize our approach to tackle the problem of transfer learning for GMMs, where similar theoretical results are derived. Finally, we demonstrate the effectiveness of our methods through simulations and real data examples. To the best of our knowledge, this is the first work studying multi-task and transfer learning on GMMs with theoretical guarantees.
PG-LBO: Enhancing High-Dimensional Bayesian Optimization with Pseudo-Label and Gaussian Process Guidance
Chen, Taicai, Duan, Yue, Li, Dong, Qi, Lei, Shi, Yinghuan, Gao, Yang
Variational Autoencoder based Bayesian Optimization (VAE-BO) has demonstrated its excellent performance in addressing high-dimensional structured optimization problems. However, current mainstream methods overlook the potential of utilizing a pool of unlabeled data to construct the latent space, while only concentrating on designing sophisticated models to leverage the labeled data. Despite their effective usage of labeled data, these methods often require extra network structures, additional procedure, resulting in computational inefficiency. To address this issue, we propose a novel method to effectively utilize unlabeled data with the guidance of labeled data. Specifically, we tailor the pseudo-labeling technique from semi-supervised learning to explicitly reveal the relative magnitudes of optimization objective values hidden within the unlabeled data. Based on this technique, we assign appropriate training weights to unlabeled data to enhance the construction of a discriminative latent space. Furthermore, we treat the VAE encoder and the Gaussian Process (GP) in Bayesian optimization as a unified deep kernel learning process, allowing the direct utilization of labeled data, which we term as Gaussian Process guidance. This directly and effectively integrates the goal of improving GP accuracy into the VAE training, thereby guiding the construction of the latent space. The extensive experiments demonstrate that our proposed method outperforms existing VAE-BO algorithms in various optimization scenarios. Our code will be published at https://github.com/TaicaiChen/PG-LBO.
FlexSSL : A Generic and Efficient Framework for Semi-Supervised Learning
Qin, Huiling, Zhan, Xianyuan, Li, Yuanxun, Zheng, Yu
Semi-supervised learning holds great promise for many real-world applications, due to its ability to leverage both unlabeled and expensive labeled data. However, most semi-supervised learning algorithms still heavily rely on the limited labeled data to infer and utilize the hidden information from unlabeled data. We note that any semi-supervised learning task under the self-training paradigm also hides an auxiliary task of discriminating label observability. Jointly solving these two tasks allows full utilization of information from both labeled and unlabeled data, thus alleviating the problem of over-reliance on labeled data. This naturally leads to a new generic and efficient learning framework without the reliance on any domain-specific information, which we call FlexSSL. The key idea of FlexSSL is to construct a semi-cooperative "game", which forges cooperation between a main self-interested semi-supervised learning task and a companion task that infers label observability to facilitate main task training. We show with theoretical derivation of its connection to loss re-weighting on noisy labels. Through evaluations on a diverse range of tasks, we demonstrate that FlexSSL can consistently enhance the performance of semi-supervised learning algorithms.
Soft Contrastive Learning for Time Series
Lee, Seunghan, Park, Taeyoung, Lee, Kibok
Contrastive learning has shown to be effective to learn representations from time series in a self-supervised way. However, contrasting similar time series instances or values from adjacent timestamps within a time series leads to ignore their inherent correlations, which results in deteriorating the quality of learned representations. To address this issue, we propose SoftCLT, a simple yet effective soft contrastive learning strategy for time series. This is achieved by introducing instance-wise and temporal contrastive loss with soft assignments ranging from zero to one. Specifically, we define soft assignments for 1) instance-wise contrastive loss by the distance between time series on the data space, and 2) temporal contrastive loss by the difference of timestamps. SoftCLT is a plug-and-play method for time series contrastive learning that improves the quality of learned representations without bells and whistles. In experiments, we demonstrate that SoftCLT consistently improves the performance in various downstream tasks including classification, semi-supervised learning, transfer learning, and anomaly detection, showing stateof-the-art performance. Code is available at this repository: https://github. Time series (TS) data are ubiquitous in many fields, including finance, energy, healthcare, and transportation (Ding et al., 2020; Lago et al., 2018; Solares et al., 2020; Cai et al., 2020). However, annotating TS data can be challenging as it often requires significant domain expertise and time. To overcome the limitation and utilize unlabeled data without annotations, self-supervised learning has emerged as a promising representation learning approach not only in natural language processing (Devlin et al., 2018; Gao et al., 2021) and computer vision (Chen et al., 2020; Dosovitskiy et al., 2021), but also in TS analysis (Franceschi et al., 2019; Yue et al., 2022). In particular, contrastive learning (CL) has demonstrated remarkable performance across different domains (Chen et al., 2020; Gao et al., 2021; Yue et al., 2022). As it is challenging to determine similarities of instances in self-supervised learning, recent CL works apply data augmentation to generate two views per data and take views from the same instance as positive pairs and the others as negatives (Chen et al., 2020).
Joint empirical risk minimization for instance-dependent positive-unlabeled data
Rejchel, Wojciech, Teisseyre, Paweล, Mielniczuk, Jan
Learning from positive and unlabeled data (PU learning) is actively researched machine learning task. The goal is to train a binary classification model based on a training dataset containing part of positives which are labeled, and unlabeled instances. Unlabeled set includes remaining part of positives and all negative observations. An important element in PU learning is modeling of the labeling mechanism, i.e. labels' assignment to positive observations. Unlike in many prior works, we consider a realistic setting for which probability of label assignment, i.e. propensity score, is instance-dependent. In our approach we investigate minimizer of an empirical counterpart of a joint risk which depends on both posterior probability of inclusion in a positive class as well as on a propensity score. The non-convex empirical risk is alternately optimised with respect to parameters of both functions. In the theoretical analysis we establish risk consistency of the minimisers using recently derived methods from the theory of empirical processes. Besides, the important development here is a proposed novel implementation of an optimisation algorithm, for which sequential approximation of a set of positive observations among unlabeled ones is crucial. This relies on modified technique of 'spies' as well as on a thresholding rule based on conditional probabilities. Experiments conducted on 20 data sets for various labeling scenarios show that the proposed method works on par or more effectively than state-of-the-art methods based on propensity function estimation.
Twice Class Bias Correction for Imbalanced Semi-Supervised Learning
Li, Lan, Tao, Bowen, Han, Lu, Zhan, De-chuan, Ye, Han-jia
Differing from traditional semi-supervised learning, class-imbalanced semi-supervised learning presents two distinct challenges: (1) The imbalanced distribution of training samples leads to model bias towards certain classes, and (2) the distribution of unlabeled samples is unknown and potentially distinct from that of labeled samples, which further contributes to class bias in the pseudo-labels during training. To address these dual challenges, we introduce a novel approach called \textbf{T}wice \textbf{C}lass \textbf{B}ias \textbf{C}orrection (\textbf{TCBC}). We begin by utilizing an estimate of the class distribution from the participating training samples to correct the model, enabling it to learn the posterior probabilities of samples under a class-balanced prior. This correction serves to alleviate the inherent class bias of the model. Building upon this foundation, we further estimate the class bias of the current model parameters during the training process. We apply a secondary correction to the model's pseudo-labels for unlabeled samples, aiming to make the assignment of pseudo-labels across different classes of unlabeled samples as equitable as possible. Through extensive experimentation on CIFAR10/100-LT, STL10-LT, and the sizable long-tailed dataset SUN397, we provide conclusive evidence that our proposed TCBC method reliably enhances the performance of class-imbalanced semi-supervised learning.
Transfer and Alignment Network for Generalized Category Discovery
An, Wenbin, Tian, Feng, Shi, Wenkai, Chen, Yan, Wu, Yaqiang, Wang, Qianying, Chen, Ping
Generalized Category Discovery is a crucial real-world task. Despite the improved performance on known categories, current methods perform poorly on novel categories. We attribute the poor performance to two reasons: biased knowledge transfer between labeled and unlabeled data and noisy representation learning on the unlabeled data. To mitigate these two issues, we propose a Transfer and Alignment Network (TAN), which incorporates two knowledge transfer mechanisms to calibrate the biased knowledge and two feature alignment mechanisms to learn discriminative features. Specifically, we model different categories with prototypes and transfer the prototypes in labeled data to correct model bias towards known categories. On the one hand, we pull instances with known categories in unlabeled data closer to these prototypes to form more compact clusters and avoid boundary overlap between known and novel categories. On the other hand, we use these prototypes to calibrate noisy prototypes estimated from unlabeled data based on category similarities, which allows for more accurate estimation of prototypes for novel categories that can be used as reliable learning targets later. After knowledge transfer, we further propose two feature alignment mechanisms to acquire both instance- and category-level knowledge from unlabeled data by aligning instance features with both augmented features and the calibrated prototypes, which can boost model performance on both known and novel categories with less noise. Experiments on three benchmark datasets show that our model outperforms SOTA methods, especially on novel categories. Theoretical analysis is provided for an in-depth understanding of our model in general. Our code and data are available at https://github.com/Lackel/TAN.
Dynamic Sub-graph Distillation for Robust Semi-supervised Continual Learning
Fan, Yan, Wang, Yu, Zhu, Pengfei, Hu, Qinghua
Continual learning (CL) has shown promising results and comparable performance to learning at once in a fully supervised manner. However, CL strategies typically require a large number of labeled samples, making their real-life deployment challenging. In this work, we focus on semi-supervised continual learning (SSCL), where the model progressively learns from partially labeled data with unknown categories. We provide a comprehensive analysis of SSCL and demonstrate that unreliable distributions of unlabeled data lead to unstable training and refinement of the progressing stages. This problem severely impacts the performance of SSCL. To address the limitations, we propose a novel approach called Dynamic Sub-Graph Distillation (DSGD) for semi-supervised continual learning, which leverages both semantic and structural information to achieve more stable knowledge distillation on unlabeled data and exhibit robustness against distribution bias. Firstly, we formalize a general model of structural distillation and design a dynamic graph construction for the continual learning progress. Next, we define a structure distillation vector and design a dynamic sub-graph distillation algorithm, which enables end-to-end training and adaptability to scale up tasks. The entire proposed method is adaptable to various CL methods and supervision settings. Finally, experiments conducted on three datasets CIFAR10, CIFAR100, and ImageNet-100, with varying supervision ratios, demonstrate the effectiveness of our proposed approach in mitigating the catastrophic forgetting problem in semi-supervised continual learning scenarios.
Quantum Generative Adversarial Networks: Bridging Classical and Quantum Realms
Nokhwal, Sahil, Nokhwal, Suman, Pahune, Saurabh, Chaudhary, Ankit
In this pioneering research paper, we present a groundbreaking exploration into the synergistic fusion of classical and quantum computing paradigms within the realm of Generative Adversarial Networks (GANs). Our objective is to seamlessly integrate quantum computational elements into the conventional GAN architecture, thereby unlocking novel pathways for enhanced training processes. Drawing inspiration from the inherent capabilities of quantum bits (qubits), we delve into the incorporation of quantum data representation methodologies within the GAN framework. By capitalizing on the unique quantum features, we aim to accelerate the training process of GANs, offering a fresh perspective on the optimization of generative models. Our investigation deals with theoretical considerations and evaluates the potential quantum advantages that may manifest in terms of training efficiency and generative quality. We confront the challenges inherent in the quantum-classical amalgamation, addressing issues related to quantum hardware constraints, error correction mechanisms, and scalability considerations. This research is positioned at the forefront of quantum-enhanced machine learning, presenting a critical stride towards harnessing the computational power of quantum systems to expedite the training of Generative Adversarial Networks. Through our comprehensive examination of the interface between classical and quantum realms, we aim to uncover transformative insights that will propel the field forward, fostering innovation and advancing the frontier of quantum machine learning.
Unsupervised Learning of Phylogenetic Trees via Split-Weight Embedding
Kong, Yibo, Tiley, George P., Solis-Lemus, Claudia
The Tree of Life is a massive graphical structure which represents the evolutionary process from single cell organisms into the immense biodiversity of living species in present time. Estimating the Tree of Life would not only represent the greatest accomplishment in evolutionary biology and systematics, but it would also allow us to fully understand the development and evolution of important biological traits in nature, in particular, those related to resilience to extinction when exposed to environmental threats such as climate change. Therefore, the development of statistical and machine-learning theory to reconstruct the Tree of Life, especially those scalable to big data, are paramount in evolutionary biology, systematics, and conservation efforts against mass extinctions. Graphical structures that represent evolutionary processes are denoted phylogenetic trees. A phylogenetic tree is a binary tree whose internal nodes represent ancestral species that over time differentiate into two separate species giving rise to its two children nodes (see Figure 1 left). The evolutionary process is then depicted by this bifurcating tree from the root (the origin of life) to the external nodes of the tree (also denoted leaves) which represent the living organisms today.