Chen, Chaochao
Federated Learning on Non-iid Data via Local and Global Distillation
Zheng, Xiaolin, Ying, Senci, Zheng, Fei, Yin, Jianwei, Zheng, Longfei, Chen, Chaochao, Dong, Fengqin
Most existing federated learning algorithms are based on the vanilla FedAvg scheme. However, with the increase of data complexity and the number of model parameters, the amount of communication traffic and the number of iteration rounds for training such algorithms increases significantly, especially in non-independently and homogeneously distributed scenarios, where they do not achieve satisfactory performance. In this work, we propose FedND: federated learning with noise distillation. The main idea is to use knowledge distillation to optimize the model training process. In the client, we propose a self-distillation method to train the local model. In the server, we generate noisy samples for each client and use them to distill other clients. Finally, the global model is obtained by the aggregation of local models. Experimental results show that the algorithm achieves the best performance and is more communication-efficient than state-of-the-art methods.
Adaptive Client Sampling in Federated Learning via Online Learning with Bandit Feedback
Zhao, Boxin, Wang, Lingxiao, Kolar, Mladen, Liu, Ziqi, Zhang, Zhiqiang, Zhou, Jun, Chen, Chaochao
Due to the high cost of communication, federated learning (FL) systems need to sample a subset of clients that are involved in each round of training. As a result, client sampling plays an important role in FL systems as it affects the convergence rate of optimization algorithms used to train machine learning models. Despite its importance, there is limited work on how to sample clients effectively. In this paper, we cast client sampling as an online learning task with bandit feedback, which we solve with an online stochastic mirror descent (OSMD) algorithm designed to minimize the sampling variance. We then theoretically show how our sampling method can improve the convergence speed of optimization algorithms. To handle the tuning parameters in OSMD that depend on the unknown problem parameters, we use the online ensemble method and doubling trick. We prove a dynamic regret bound relative to any sampling sequence. The regret bound depends on the total variation of the comparator sequence, which naturally captures the intrinsic difficulty of the problem. To the best of our knowledge, these theoretical contributions are new and the proof technique is of independent interest. Through both synthetic and real data experiments, we illustrate advantages of the proposed client sampling algorithm over the widely used uniform sampling and existing online learning based sampling strategies. The proposed adaptive sampling procedure is applicable beyond the FL problem studied here and can be used to improve the performance of stochastic optimization procedures such as stochastic gradient descent and stochastic coordinate descent.
Reducing Communication for Split Learning by Randomized Top-k Sparsification
Zheng, Fei, Chen, Chaochao, Lyu, Lingjuan, Yao, Binhui
Split learning is a simple solution for Vertical Federated Learning (VFL), which has drawn substantial attention in both research and application due to its simplicity and efficiency. However, communication efficiency is still a crucial issue for split learning. In this paper, we investigate multiple communication reduction methods for split learning, including cut layer size reduction, top-k sparsification, quantization, and L1 regularization. Through analysis of the cut layer size reduction and top-k sparsification, we further propose randomized top-k sparsification, to make the model generalize and converge better. This is done by selecting top-k elements with a large probability while also having a small probability to select non-top-k elements. Empirical results show that compared with other communication-reduction methods, our proposed randomized top-k sparsification achieves a better model performance under the same compression level.
Robust Representation Learning with Reliable Pseudo-labels Generation via Self-Adaptive Optimal Transport for Short Text Clustering
Zheng, Xiaolin, Hu, Mengling, Liu, Weiming, Chen, Chaochao, Liao, Xinting
Short text clustering is challenging since it takes imbalanced and noisy data as inputs. Existing approaches cannot solve this problem well, since (1) they are prone to obtain degenerate solutions especially on heavy imbalanced datasets, and (2) they are vulnerable to noises. To tackle the above issues, we propose a Robust Short Text Clustering (RSTC) model to improve robustness against imbalanced and noisy data. RSTC includes two modules, i.e., pseudo-label generation module and robust representation learning module. The former generates pseudo-labels to provide supervision for the later, which contributes to more robust representations and correctly separated clusters. To provide robustness against the imbalance in data, we propose self-adaptive optimal transport in the pseudo-label generation module. To improve robustness against the noise in data, we further introduce both class-wise and instance-wise contrastive learning in the robust representation learning module. Our empirical studies on eight short text clustering datasets demonstrate that RSTC significantly outperforms the state-of-the-art models. The code is available at: https://github.com/hmllmh/RSTC.
PPGenCDR: A Stable and Robust Framework for Privacy-Preserving Cross-Domain Recommendation
Liao, Xinting, Liu, Weiming, Zheng, Xiaolin, Yao, Binhui, Chen, Chaochao
Privacy-preserving cross-domain recommendation (PPCDR) refers to preserving the privacy of users when transferring the knowledge from source domain to target domain for better performance, which is vital for the long-term development of recommender systems. Existing work on cross-domain recommendation (CDR) reaches advanced and satisfying recommendation performance, but mostly neglects preserving privacy. To fill this gap, we propose a privacy-preserving generative cross-domain recommendation (PPGenCDR) framework for PPCDR. PPGenCDR includes two main modules, i.e., stable privacy-preserving generator module, and robust cross-domain recommendation module. Specifically, the former isolates data from different domains with a generative adversarial network (GAN) based model, which stably estimates the distribution of private data in the source domain with Renyi differential privacy (RDP) technique. Then the latter aims to robustly leverage the perturbed but effective knowledge from the source domain with the raw data in target domain to improve recommendation performance. Three key modules, i.e., (1) selective privacy preserver, (2) GAN stabilizer, and (3) robustness conductor, guarantee the cost-effective trade-off between utility and privacy, the stability of GAN when using RDP, and the robustness of leveraging transferable knowledge accordingly. The extensive empirical studies on Douban and Amazon datasets demonstrate that PPGenCDR significantly outperforms the state-of-the-art recommendation models while preserving privacy.
INCREASE: Inductive Graph Representation Learning for Spatio-Temporal Kriging
Zheng, Chuanpan, Fan, Xiaoliang, Wang, Cheng, Qi, Jianzhong, Chen, Chaochao, Chen, Longbiao
Spatio-temporal kriging is an important problem in web and social applications, such as Web or Internet of Things, where things (e.g., sensors) connected into a web often come with spatial and temporal properties. It aims to infer knowledge for (the things at) unobserved locations using the data from (the things at) observed locations during a given time period of interest. This problem essentially requires \emph{inductive learning}. Once trained, the model should be able to perform kriging for different locations including newly given ones, without retraining. However, it is challenging to perform accurate kriging results because of the heterogeneous spatial relations and diverse temporal patterns. In this paper, we propose a novel inductive graph representation learning model for spatio-temporal kriging. We first encode heterogeneous spatial relations between the unobserved and observed locations by their spatial proximity, functional similarity, and transition probability. Based on each relation, we accurately aggregate the information of most correlated observed locations to produce inductive representations for the unobserved locations, by jointly modeling their similarities and differences. Then, we design relation-aware gated recurrent unit (GRU) networks to adaptively capture the temporal correlations in the generated sequence representations for each relation. Finally, we propose a multi-relation attention mechanism to dynamically fuse the complex spatio-temporal information at different time steps from multiple relations to compute the kriging output. Experimental results on three real-world datasets show that our proposed model outperforms state-of-the-art methods consistently, and the advantage is more significant when there are fewer observed locations. Our code is available at https://github.com/zhengchuanpan/INCREASE.
Heterogeneous Information Crossing on Graphs for Session-based Recommender Systems
Zheng, Xiaolin, Wu, Rui, Han, Zhongxuan, Chen, Chaochao, Chen, Linxun, Han, Bing
Recommender systems are fundamental information filtering techniques to recommend content or items that meet users' personalities and potential needs. As a crucial solution to address the difficulty of user identification and unavailability of historical information, session-based recommender systems provide recommendation services that only rely on users' behaviors in the current session. However, most existing studies are not well-designed for modeling heterogeneous user behaviors and capturing the relationships between them in practical scenarios. To fill this gap, in this paper, we propose a novel graph-based method, namely Heterogeneous Information Crossing on Graphs (HICG). HICG utilizes multiple types of user behaviors in the sessions to construct heterogeneous graphs, and captures users' current interests with their long-term preferences by effectively crossing the heterogeneous information on the graphs. In addition, we also propose an enhanced version, named HICG-CL, which incorporates contrastive learning (CL) technique to enhance item representation ability. By utilizing the item co-occurrence relationships across different sessions, HICG-CL improves the recommendation performance of HICG. We conduct extensive experiments on three real-world recommendation datasets, and the results verify that (i) HICG achieves the state-of-the-art performance by utilizing multiple types of behaviors on the heterogeneous graph. (ii) HICG-CL further significantly improves the recommendation performance of HICG by the proposed contrastive learning module.
A Unified Framework for Cross-Domain and Cross-System Recommendations
Zhu, Feng, Wang, Yan, Zhou, Jun, Chen, Chaochao, Li, Longfei, Liu, Guanfeng
Cross-Domain Recommendation (CDR) and Cross-System Recommendation (CSR) have been proposed to improve the recommendation accuracy in a target dataset (domain/system) with the help of a source one with relatively richer information. However, most existing CDR and CSR approaches are single-target, namely, there is a single target dataset, which can only help the target dataset and thus cannot benefit the source dataset. In this paper, we focus on three new scenarios, i.e., Dual-Target CDR (DTCDR), Multi-Target CDR (MTCDR), and CDR+CSR, and aim to improve the recommendation accuracy in all datasets simultaneously for all scenarios. To do this, we propose a unified framework, called GA (based on Graph embedding and Attention techniques), for all three scenarios. In GA, we first construct separate heterogeneous graphs to generate more representative user and item embeddings. Then, we propose an element-wise attention mechanism to effectively combine the embeddings of common entities (users/items) learned from different datasets. Moreover, to avoid negative transfer, we further propose a Personalized training strategy to minimize the embedding difference of common entities between a richer dataset and a sparser dataset, deriving three new models, i.e., GA-DTCDR-P, GA-MTCDR-P, and GA-CDR+CSR-P, for the three scenarios respectively. Extensive experiments conducted on four real-world datasets demonstrate that our proposed GA models significantly outperform the state-of-the-art approaches.
Cross-Domain Recommendation: Challenges, Progress, and Prospects
Zhu, Feng, Wang, Yan, Chen, Chaochao, Zhou, Jun, Li, Longfei, Liu, Guanfeng
To address the long-standing data sparsity problem in recommender systems (RSs), cross-domain recommendation (CDR) has been proposed to leverage the relatively richer information from a richer domain to improve the recommendation performance in a sparser domain. Although CDR has been extensively studied in recent years, there is a lack of a systematic review of the existing CDR approaches. To fill this gap, in this paper, we provide a comprehensive review of existing CDR approaches, including challenges, research progress, and future directions. Specifically, we first summarize existing CDR approaches into four types, including single-target CDR, multi-domain recommendation, dual-target CDR, and multi-target CDR. We then present the definitions and challenges of these CDR approaches. Next, we propose a full-view categorization and new taxonomies on these approaches and report their research progress in detail. In the end, we share several promising research directions in CDR.
Improving Federated Relational Data Modeling via Basis Alignment and Weight Penalty
Lin, Yilun, Chen, Chaochao, Chen, Cen, Wang, Li
Federated learning (FL) has attracted increasing attention in recent years. As a privacy-preserving collaborative learning paradigm, it enables a broader range of applications, especially for computer vision and natural language processing tasks. However, to date, there is limited research of federated learning on relational data, namely Knowledge Graph (KG). In this work, we present a modified version of the graph neural network algorithm that performs federated modeling over KGs across different participants. Specifically, to tackle the inherent data heterogeneity issue and inefficiency in algorithm convergence, we propose a novel optimization algorithm, named FedAlign, with 1) optimal transportation (OT) for on-client personalization and 2) weight constraint to speed up the convergence. Extensive experiments have been conducted on several widely used datasets. Empirical results show that our proposed method outperforms the state-of-the-art FL methods, such as FedAVG and FedProx, with better convergence.