Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach

Han, Pengchao, Wang, Shiqiang, Leung, Kin K.

Jan-16-2020–arXiv.org Machine Learning

--Federated learning (FL) is an emerging technique for training machine learning models using geographically dispersed data collected by local entities. It includes local computation and synchronization steps. T o reduce the communication overhead and improve the overall efficiency of FL, gradient sparsification (GS) can be applied, where instead of the full gradient, only a small subset of important elements of the gradient is communicated. Existing work on GS uses a fixed degree of gradient sparsity for i.i.d.-distributed data within a datacenter . In this paper, we consider adaptive degree of sparsity and non-i.i.d. We first present a fairness-aware GS method which ensures that different clients provide a similar amount of updates. Then, with the goal of minimizing the overall training time, we propose a novel online learning formulation and algorithm for automatically determining the near-optimal communication and computation tradeoff that is controlled by the degree of gradient sparsity. The online learning algorithm uses an estimated sign of the derivative of the objective function, which gives a regret bound that is asymptotically equal to the case where exact derivative is available. Experiments with real datasets confirm the benefits of our proposed approaches, showing up to 40% improvement in model accuracy for a finite training time. Modern consumer and enterprise users generate a large amount of data at the network edge, such as sensor measurements from Internet of Things (IoT) devices, images captured by cameras, transaction records of different branches of a company, etc. Such data may not be shareable with a central cloud, due to data privacy regulations and communication bandwidth limitation [1]. In these scenarios, federated learning (FL) is a useful approach for training machine learning models from local data [1]-[5]. The basic process of FL includes local gradient computation at clients and model weight (parameter) aggregation through a server. Instead of sharing the raw data, only model weights or gradients need to be shared between the clients and the server in the FL process.

communication time, gradient, training time, (16 more...)

arXiv.org Machine Learning

Jan-16-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - United States (0.46)
  - Canada > Ontario
    - Toronto (0.14)
- Europe > United Kingdom
  - England > Greater London > London (0.04)

Genre:
- Research Report (0.40)

Industry:
- Information Technology > Security & Privacy (1.00)
- Education > Educational Setting
  - Online (0.83)

Technology:
- Information Technology
  - Artificial Intelligence > Machine Learning (1.00)
  - Enterprise Applications > Human Resources
    - Learning Management (0.83)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found