unlabeled client
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.04)
- Europe > Italy (0.04)
- Asia > Japan (0.04)
- Asia > China (0.04)
SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training
Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address the problem of combining communication-efficient FL such as FedAvg with Semi-Supervised Learning (SSL). In SemiFL, clients have completely unlabeled data and can train multiple local epochs to reduce communication costs, while the server has a small amount of labeled data. We provide a theoretical understanding of the success of data augmentation-based SSL methods to illustrate the bottleneck of a vanilla combination of communication-efficient FL with SSL. To address this issue, we propose alternate training to'fine-tune global model with labeled data' and'generate pseudo-labels with the global model.' We conduct extensive experiments and demonstrate that our approach significantly improves the performance of a labeled server with unlabeled clients training with multiple local epochs. Moreover, our method outperforms many existing SSFL baselines and performs competitively with the state-of-the-art FL and SSL results.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- Europe > Italy (0.04)
- (2 more...)
- Information Technology (0.68)
- Health & Medicine (0.46)
Federated Learning with Unlabeled Clients: Personalization Can Happen in Low Dimensions
Zakerinia, Hossein, Scott, Jonathan, Lampert, Christoph H.
Personalized federated learning has emerged as a popular approach to training on devices holding statistically heterogeneous data, known as clients. However, most existing approaches require a client to have labeled data for training or finetuning in order to obtain their own personalized model. In this paper we address this by proposing FLowDUP, a novel method that is able to generate a personalized model using only a forward pass with unlabeled data. The generated model parameters reside in a low-dimensional subspace, enabling efficient communication and computation. FLowDUP's learning objective is theoretically motivated by our new transductive multi-task PAC-Bayesian generalization bound, that provides performance guarantees for unlabeled clients. The objective is structured in such a way that it allows both clients with labeled data and clients with only unlabeled data to contribute to the training process. To supplement our theoretical results we carry out a thorough experimental evaluation of FLowDUP, demonstrating strong empirical performance on a range of datasets with differing sorts of statistically heterogeneous clients. Through numerous ablation studies, we test the efficacy of the individual components of the method.
SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training
Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address the problem of combining communication-efficient FL such as FedAvg with Semi-Supervised Learning (SSL). In SemiFL, clients have completely unlabeled data and can train multiple local epochs to reduce communication costs, while the server has a small amount of labeled data.
Learning Unlabeled Clients Divergence via Anchor Model Aggregation for Federated Semi-supervised Learning
Elbatel, Marawan, Wang, Hualiang, Chen, Jixiang, Wang, Hao, Li, Xiaomeng
Federated semi-supervised learning (FedSemi) refers to scenarios where there may be clients with fully labeled data, clients with partially labeled, and even fully unlabeled clients while preserving data privacy. However, challenges arise from client drift due to undefined heterogeneous class distributions and erroneous pseudo-labels. Existing FedSemi methods typically fail to aggregate models from unlabeled clients due to their inherent unreliability, thus overlooking unique information from their heterogeneous data distribution, leading to sub-optimal results. In this paper, we enable unlabeled client aggregation through SemiAnAgg, a novel Semi-supervised Anchor-Based federated Aggregation. SemiAnAgg learns unlabeled client contributions via an anchor model, effectively harnessing their informative value. Our key idea is that by feeding local client data to the same global model and the same consistently initialized anchor model (i.e., random model), we can measure the importance of each unlabeled client accordingly. Extensive experiments demonstrate that SemiAnAgg achieves new state-of-the-art results on four widely used FedSemi benchmarks, leading to substantial performance improvements: a 9% increase in accuracy on CIFAR-100 and a 7.6% improvement in recall on the medical dataset ISIC-18, compared with prior state-of-the-art. Code is available at: https://github.com/xmed-lab/SemiAnAgg.
- Asia > China > Hong Kong (0.05)
- Europe > Latvia > Lubāna Municipality > Lubāna (0.05)
- North America > United States > New Jersey (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.73)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.63)
Rethinking Semi-Supervised Federated Learning: How to co-train fully-labeled and fully-unlabeled client imaging data
Saha, Pramit, Mishra, Divyanshu, Noble, J. Alison
The most challenging, yet practical, setting of semi-supervised federated learning (SSFL) is where a few clients have fully labeled data whereas the other clients have fully unlabeled data. This is particularly common in healthcare settings where collaborating partners (typically hospitals) may have images but not annotations. The bottleneck in this setting is the joint training of labeled and unlabeled clients as the objective function for each client varies based on the availability of labels. This paper investigates an alternative way for effective training with labeled and unlabeled clients in a federated setting. We propose a novel learning scheme specifically designed for SSFL which we call Isolated Federated Learning (IsoFed) that circumvents the problem by avoiding simple averaging of supervised and semi-supervised models together. In particular, our training approach consists of two parts - (a) isolated aggregation of labeled and unlabeled client models, and (b) local self-supervised pretraining of isolated global models in all clients. We evaluate our model performance on medical image datasets of four different modalities publicly available within the biomedical image classification benchmark MedMNIST. We further vary the proportion of labeled clients and the degree of heterogeneity to demonstrate the effectiveness of the proposed method under varied experimental settings.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Italy (0.04)
- Europe > France > Grand Est > Bas-Rhin > Strasbourg (0.04)
- (2 more...)
Dual Class-Aware Contrastive Federated Semi-Supervised Learning
Guo, Qi, Qi, Yong, Qi, Saiyu, Wu, Di
Federated semi-supervised learning (FSSL), facilitates labeled clients and unlabeled clients jointly training a global model without sharing private data. Existing FSSL methods predominantly employ pseudo-labeling and consistency regularization to exploit the knowledge of unlabeled data, achieving notable success in raw data utilization. However, these training processes are hindered by large deviations between uploaded local models of labeled and unlabeled clients, as well as confirmation bias introduced by noisy pseudo-labels, both of which negatively affect the global model's performance. In this paper, we present a novel FSSL method called Dual Class-aware Contrastive Federated Semi-Supervised Learning (DCCFSSL). This method accounts for both the local class-aware distribution of each client's data and the global class-aware distribution of all clients' data within the feature space. By implementing a dual class-aware contrastive module, DCCFSSL establishes a unified training objective for different clients to tackle large deviations and incorporates contrastive information in the feature space to mitigate confirmation bias. Moreover, DCCFSSL introduces an authentication-reweighted aggregation technique to improve the server's aggregation robustness. Our comprehensive experiments show that DCCFSSL outperforms current state-of-the-art methods on three benchmark datasets and surpasses the FedAvg with relabeled unlabeled clients on CIFAR-10, CIFAR-100, and STL-10 datasets. To our knowledge, we are the first to present an FSSL method that utilizes only 10\% labeled clients, while still achieving superior performance compared to standard federated supervised learning, which uses all clients with labeled data.
- Asia > China > Shaanxi Province > Xi'an (0.05)
- Asia > China > Hong Kong (0.04)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- (2 more...)
- Research Report > New Finding (0.46)
- Research Report > Promising Solution (0.34)
- Information Technology > Security & Privacy (1.00)
- Education (1.00)
Federated Semi-Supervised Learning with Annotation Heterogeneity
Shang, Xinyi, Huang, Gang, Lu, Yang, Lou, Jian, Han, Bo, Cheung, Yiu-ming, Wang, Hanzi
Federated Semi-Supervised Learning (FSSL) aims to learn a global model from different clients in an environment with both labeled and unlabeled data. Most of the existing FSSL work generally assumes that both types of data are available on each client. In this paper, we study a more general problem setup of FSSL with annotation heterogeneity, where each client can hold an arbitrary percentage (0%-100%) of labeled data. To this end, we propose a novel FSSL framework called Heterogeneously Annotated Semi-Supervised LEarning (HASSLE). Specifically, it is a dual-model framework with two models trained separately on labeled and unlabeled data such that it can be simply applied to a client with an arbitrary labeling percentage. Furthermore, a mutual learning strategy called Supervised-Unsupervised Mutual Alignment (SUMA) is proposed for the dual models within HASSLE with global residual alignment and model proximity alignment. Subsequently, the dual models can implicitly learn from both types of data across different clients, although each dual model is only trained locally on a single type of data. Experiments verify that the dual models in HASSLE learned by SUMA can mutually learn from each other, thereby effectively utilizing the information of both types of data across different clients.
- North America > Canada > Ontario > Toronto (0.14)
- North America > United States > Virginia (0.04)
- Europe > Italy (0.04)
- (3 more...)
SemiFL: Semi-Supervised Federated Learning for Unlabeled Clients with Alternate Training
Diao, Enmao, Ding, Jie, Tarokh, Vahid
Federated Learning allows the training of machine learning models by using the computation and private data resources of many distributed clients. Most existing results on Federated Learning (FL) assume the clients have ground-truth labels. However, in many practical scenarios, clients may be unable to label task-specific data due to a lack of expertise or resource. We propose SemiFL to address the problem of combining communication-efficient FL such as FedAvg with Semi-Supervised Learning (SSL). In SemiFL, clients have completely unlabeled data and can train multiple local epochs to reduce communication costs, while the server has a small amount of labeled data. We provide a theoretical understanding of the success of data augmentation-based SSL methods to illustrate the bottleneck of a vanilla combination of communication-efficient FL with SSL. To address this issue, we propose alternate training to `fine-tune global model with labeled data' and `generate pseudo-labels with the global model.' We conduct extensive experiments and demonstrate that our approach significantly improves the performance of a labeled server with unlabeled clients training with multiple local epochs. Moreover, our method outperforms many existing SSFL baselines and performs competitively with the state-of-the-art FL and SSL results.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > North Carolina > Durham County > Durham (0.04)
- Europe > Italy (0.04)
- (2 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.67)