Goto

Collaborating Authors

 An, Xuming


Federated Learning with Only Positive Labels by Exploring Label Correlations

arXiv.org Artificial Intelligence

This approach, however, treats different labels equally Federated learning (FL) [1] is a novel machine learning in the spreadout (class embedding separation) process. That paradigm that trains an algorithm across multiple decentralized is, embeddings of class labels that are highly correlated and clients (such as edge devices) or servers without exchanging significantly different in multiple labels' space are separated in local data samples. Since clients can only access the local the same way. This is not reasonable since embeddings should datasets, the user's privacy can be well protected, and this be close for correlated labels, and dissimilar otherwise. For paradigm has attracted increasing attention in recent years [2]- example, we assume that the class labels'Desktop computer' [4]. In this paper, we study the challenge problem of learning and'Desk' often appear in the same instance, thus these two a multi-label classification model [5], [6] under the federated corresponding class embedding vectors can be deemed highcorrelation learning setting, where each user has only local positive data and may be relatively close compared with others, related to a single class label [7]. This setting can be treated such as class labels'aircraft', 'automobile', etc. Besides, since as the extremely label-skew case in the data heterogeneity of the instance and class embeddings are trained on clients and federated learning, which is popular in real-world applications.


Federated Learning with Manifold Regularization and Normalized Update Reaggregation

arXiv.org Artificial Intelligence

Federated Learning (FL) is an emerging collaborative machine learning framework where multiple clients train the global model without sharing their own datasets. In FL, the model inconsistency caused by the local data heterogeneity across clients results in the near-orthogonality of client updates, which leads to the global update norm reduction and slows down the convergence. Most previous works focus on eliminating the difference of parameters (or gradients) between the local and global models, which may fail to reflect the model inconsistency due to the complex structure of the machine learning model and the Euclidean space's limitation in meaningful geometric representations. In this paper, we propose FedMRUR by adopting the manifold model fusion scheme and a new global optimizer to alleviate the negative impacts. Concretely, FedMRUR adopts a hyperbolic graph manifold regularizer enforcing the representations of the data in the local and global models are close to each other in a low-dimensional subspace. Because the machine learning model has the graph structure, the distance in hyperbolic space can reflect the model bias better than the Euclidean distance. In this way, FedMRUR exploits the manifold structures of the representations to significantly reduce the model inconsistency. FedMRUR also aggregates the client updates norms as the global update norm, which can appropriately enlarge each client's contribution to the global update, thereby mitigating the norm reduction introduced by the near-orthogonality of client updates. Furthermore, we theoretically prove that our algorithm can achieve a linear speedup property for non-convex setting under partial client participation.Experiments demonstrate that FedMRUR can achieve a new state-of-the-art (SOTA) accuracy with less communication.


Over-the-Air Computation Aided Federated Learning with the Aggregation of Normalized Gradient

arXiv.org Artificial Intelligence

Over-the-air computation is a communication-efficient solution for federated learning (FL). In such a system, iterative procedure is performed: Local gradient of private loss function is updated, amplified and then transmitted by every mobile device; the server receives the aggregated gradient all-at-once, generates and then broadcasts updated model parameters to every mobile device. In terms of amplification factor selection, most related works suppose the local gradient's maximal norm always happens although it actually fluctuates over iterations, which may degrade convergence performance. To circumvent this problem, we propose to turn local gradient to be normalized one before amplifying it. Under our proposed method, when the loss function is smooth, we prove our proposed method can converge to stationary point at sub-linear rate. In case of smooth and strongly convex loss function, we prove our proposed method can achieve minimal training loss at linear rate with any small positive tolerance. Moreover, a tradeoff between convergence rate and the tolerance is discovered. To speedup convergence, problems optimizing system parameters are also formulated for above two cases. Although being non-convex, optimal solution with polynomial complexity of the formulated problems are derived. Experimental results show our proposed method can outperform benchmark methods on convergence performance.


Joint Power Control and Data Size Selection for Over-the-Air Computation Aided Federated Learning

arXiv.org Artificial Intelligence

Federated learning (FL) has emerged as an appealing machine learning approach to deal with massive raw data generated at multiple mobile devices, {which needs to aggregate the training model parameter of every mobile device at one base station (BS) iteratively}. For parameter aggregating in FL, over-the-air computation is a spectrum-efficient solution, which allows all mobile devices to transmit their parameter-mapped signals concurrently to a BS. Due to heterogeneous channel fading and noise, there exists difference between the BS's received signal and its desired signal, measured as the mean-squared error (MSE). To minimize the MSE, we propose to jointly optimize the signal amplification factors at the BS and the mobile devices as well as the data size (the number of data samples involved in local training) at every mobile device. The formulated problem is challenging to solve due to its non-convexity. To find the optimal solution, with some simplification on cost function and variable replacement, which still preserves equivalence, we transform the changed problem to be a bi-level problem equivalently. For the lower-level problem, optimal solution is found by enumerating every candidate solution from the Karush-Kuhn-Tucker (KKT) condition. For the upper-level problem, the optimal solution is found by exploring its piecewise convexity. Numerical results show that our proposed method can greatly reduce the MSE and can help to improve the training performance of FL compared with benchmark methods.