fedsplit
- North America > United States > California > Alameda County > Berkeley (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada (0.04)
FedSplit: an algorithmic framework for fast federated optimization
Motivated by federated learning, we consider the hub-and-spoke model of distributed optimization in which a central authority coordinates the computation of a solution among many agents while limiting communication. We first study some past procedures for federated optimization, and show that their fixed points need not correspond to stationary points of the original optimization problem, even in simple convex settings with deterministic updates. In order to remedy these issues, we introduce FedSplit, a class of algorithms based on operator splitting procedures for solving distributed convex minimization with additive structure. We prove that these procedures have the correct fixed points, corresponding to optima of the original optimization problem, and we characterize their convergence rates under different settings. Our theory shows that these methods are provably robust to inexact computation of intermediate local quantities. We complement our theory with some experiments that demonstrate the benefits of our methods in practice.
- North America > United States > California > Alameda County > Berkeley (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)
- North America > Canada (0.04)
A Related Work on Compression Techniques in Distributed Optimization and Learning
The analysis in this section follows the techniques introduced in [8]. The proof of Proposition 2 follows roughly the same steps as the proof of Proposition 1. In this section, we will compile some results that will prove to be useful later in our analysis. With this in mind, we will assume throughout this section that all clients perform the same number of local updates, i.e., To that end, we will make use of the following lemma. To prove Theorem 5, we will construct an example involving two clients.
Review for NeurIPS paper: FedSplit: an algorithmic framework for fast federated optimization
Weaknesses: Main criticism: 1) The paper claims two main contributions, one of which is "The first contribution of this paper is to analyze some past procedures, and show that even in the favorable setting of deterministic updates (i.e., no stochastic approximation used), these methods typically fail to preserve solutions of the original optimization problem as fixed points " I believe the text above is misleading. In fact, it was already well known for the "past procedures" to not have the correct fixed points; one alternative approach to deal with such an issue was to incorporate the "drift"; see https://arxiv.org/abs/1910.06378 for example. Therefore, believe it would be more appropriate to not claim the contribution for showing the wrong fixed point of the local algorithms. Specifically, in strongly convex case (same arguments apply for weakly convex), the communication complexity of FedSplit is O(sqrt(kappa)log(1/epsilon)), which is identical to the communication complexity of AGD. In fact, AGD is favorable (in terms of the rate) as is requires a single gradient evaluation instead of evaluating the prox with high enough precision so that the inexactness does not drive the rate.
An Operator Splitting View of Federated Learning
Malekmohammadi, Saber, Shaloudegi, Kiarash, Hu, Zeou, Yu, Yaoliang
Over the past few years, the federated learning ($\texttt{FL}$) community has witnessed a proliferation of new $\texttt{FL}$ algorithms. However, our understating of the theory of $\texttt{FL}$ is still fragmented, and a thorough, formal comparison of these algorithms remains elusive. Motivated by this gap, we show that many of the existing $\texttt{FL}$ algorithms can be understood from an operator splitting point of view. This unification allows us to compare different algorithms with ease, to refine previous convergence results and to uncover new algorithmic variants. In particular, our analysis reveals the vital role played by the step size in $\texttt{FL}$ algorithms. The unification also leads to a streamlined and economic way to accelerate $\texttt{FL}$ algorithms, without incurring any communication overhead. We perform numerical experiments on both convex and nonconvex models to validate our findings.
- North America > United States > Virginia (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
- (2 more...)
FedSplit: an algorithmic framework for fast federated optimization
Motivated by federated learning, we consider the hub-and-spoke model of distributed optimization in which a central authority coordinates the computation of a solution among many agents while limiting communication. We first study some past procedures for federated optimization, and show that their fixed points need not correspond to stationary points of the original optimization problem, even in simple convex settings with deterministic updates. In order to remedy these issues, we introduce FedSplit, a class of algorithms based on operator splitting procedures for solving distributed convex minimization with additive structure. We prove that these procedures have the correct fixed points, corresponding to optima of the original optimization problem, and we characterize their convergence rates under different settings. Our theory shows that these methods are provably robust to inexact computation of intermediate local quantities.
FedSplit: An algorithmic framework for fast federated optimization
Pathak, Reese, Wainwright, Martin J.
Federated learning is a rapidly evolving application of distributed optimization for estimation and learning problems in large-scale networks of remote clients [13]. These systems present new challenges, as they are characterized by heterogeneity in computational resources and data across the network, unreliable communication, massive scale, and privacy constraints [16]. A typical application is for developers of cell phones and cellular applications to model the usage of software and devices across millions or even billions of users. Distributed optimization has a rich history and extensive literature (e.g., see the sources [2, 5, 8, 31, 15, 24] and references therein), and federated learning has led to a flurry of interest in the area. A number of different procedures have been proposed for federated learning and related problems, using methods based on stochastic gradient methods or proximal procedures. Notably, McMahan et al. [18] introduced the FedSGD and FedAvg algorithms, which both adapt the classical stochastic gradient method to the federated setting, considering the possibility that clients may fail and may only be subsampled on each round of computation. Another recent proposal has been to use regularized local problems to mitigate possible issues that arise with device heterogeneity and failures [17]. These authors propose the FedProx procedure, an algorithm that applied averaged proximal updates to solve federated minimization problems. Currently, the convergence theory and correctness of these methods is currently lacking, and practitioners have documented failures of convergence in certain settings (e.g., see Figure 3 and related discussion in the work [18]).
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- North America > United States > Massachusetts > Suffolk County > Boston (0.04)