A Assumptions and Theoretical Results A.1 Assumptions of risk functions Definition 1

Neural Information Processing Systems 

L-Lipschitz continuous gradient, if there exists a constant L > 0, such that null f (x) f (y)null Lnull x y null, x,y. If f is m-strongly convex and has an L-Lipschitz continuous gradient, then it is obvious that m L. Let λ be the Lagrange multiplier. Using Jensen's inequality, we have r We next prove the convergence of the algorithm with the proposed weight assignment rule. An edge between two agents means they are neighbors. This is to model the realistic scenario in which some of the agents may have less data samples and they may learn slowly than others.