Tang, Xinyu
Price of Stability in Quality-Aware Federated Learning
Yan, Yizhou, Tang, Xinyu, Huang, Chao, Tang, Ming
Federated Learning (FL) is a distributed machine learning scheme that enables clients to train a shared global model without exchanging local data. The presence of label noise can severely degrade the FL performance, and some existing studies have focused on algorithm design for label denoising. However, they ignored the important issue that clients may not apply costly label denoising strategies due to them being self-interested and having heterogeneous valuations on the FL performance. To fill this gap, we model the clients' interactions as a novel label denoising game and characterize its equilibrium. We also analyze the price of stability, which quantifies the difference in the system performance (e.g., global model accuracy, social welfare) between the equilibrium outcome and the socially optimal solution. We prove that the equilibrium outcome always leads to a lower global model accuracy than the socially optimal solution does. We further design an efficient algorithm to compute the socially optimal solution. Numerical experiments on MNIST dataset show that the price of stability increases as the clients' data become noisier, calling for an effective incentive mechanism.
Improving Conversational Recommendation Systems via Counterfactual Data Simulation
Wang, Xiaolei, Zhou, Kun, Tang, Xinyu, Zhao, Wayne Xin, Pan, Fan, Cao, Zhao, Wen, Ji-Rong
Conversational recommender systems (CRSs) aim to provide recommendation services via natural language conversations. Although a number of approaches have been proposed for developing capable CRSs, they typically rely on sufficient training data for training. Since it is difficult to annotate recommendation-oriented dialogue datasets, existing CRS approaches often suffer from the issue of insufficient training due to the scarcity of training data. To address this issue, in this paper, we propose a CounterFactual data simulation approach for CRS, named CFCRS, to alleviate the issue of data scarcity in CRSs. Our approach is developed based on the framework of counterfactual data augmentation, which gradually incorporates the rewriting to the user preference from a real dialogue without interfering with the entire conversation flow. To develop our approach, we characterize user preference and organize the conversation flow by the entities involved in the dialogue, and design a multi-stage recommendation dialogue simulator based on a conversation flow language model. Under the guidance of the learned user preference and dialogue schema, the flow language model can produce reasonable, coherent conversation flows, which can be further realized into complete dialogues. Based on the simulator, we perform the intervention at the representations of the interacted entities of target users, and design an adversarial training method with a curriculum schedule that can gradually optimize the data augmentation strategy. Extensive experiments show that our approach can consistently boost the performance of several competitive CRSs, and outperform other data augmentation methods, especially when the training data is limited. Our code is publicly available at https://github.com/RUCAIBox/CFCRS.
DP-RAFT: A Differentially Private Recipe for Accelerated Fine-Tuning
Panda, Ashwinee, Tang, Xinyu, Sehwag, Vikash, Mahloujifar, Saeed, Mittal, Prateek
As organizations increasingly use machine learning in real world systems to provide insights on the data generated by real users (Team, 2017), issues of user data privacy have risen to the forefront of existing problems in machine learning. Differential privacy (DP) (Dwork et al., 2006) is the de facto standard for privacy preserving statistics. Common algorithms for privately training machine learning models are differentially private stochastic gradient descent (DP-SGD) (Song et al., 2013; Abadi et al., 2016) and differentially private empirical risk minimization (DP-ERM) (Chaudhuri et al., 2011). While advancements in deep learning can be partially attributed to scaling up the number of model parameters (Kaplan et al., 2020; Brown et al., 2020), as shown by (Kurakin et al., 2022; Tramèr and Boneh, 2020; Yu et al., 2021b; Shen et al., 2021) increasing the number of model parameters in DP-SGD often has an adverse impact on the privacy-utility tradeoff due to the curse of dimensionality present in DP-SGD. Briefly, the "curse" is that the magnitude of the noise added scales with d the square root of the number of parameters, and because the signal does not scale with the number of parameters, the signal to noise ratio (SNR) suffers at scale.