Goto

Collaborating Authors

 tk 2


Experiments and Additional Results

Neural Information Processing Systems

Note that f(x,c1,c2,) is strongly concave for any (x,c,c) Rd+2.1 2 Impact of the Local Steps: In this section, we run additional experiments to investigate the impact of the local steps K on the training performance. We run FSGDA and SAGDA over the hetergenous "a9a" [40] dataset with the regression model mentioned in Section 4. We fix the local step-size at 0.01, worker number at 100, and choose the number of local update rounds K from the discrete set {2,10,20}. This is due to the fact that the algorithm needs more communication round while K is small, which matches our Corollary 2 and Corollary 3. Impact of the Local Step-size: In this experiment, we choose the value of the local step-sizes from the discrete set {0.0001,0.001,0.01}and As shown in Figure 1(a) and Fig.6(a), larger local step-sizes lead to faster convergence rates. Impact of the Global Step-size: we choose the global step-sizes value from the discrete set {2,5,10} and fix worker number at 100, local update rounds at 10.


Supplementary Material for " Path following algorithms for ℓ2-regularized M-estimation with approximation guarantee "

Neural Information Processing Systems

Figure S2: Number of iterations at each grid point for the Newton and gradient descent methods applying to the ℓ2-regularized logistic regression over simulated data generated in Example 2. We summarize the results in Figure S1-S3. Figure S1 presents the results for ridge regression. In this case, the number of iterations by gradient method first increases and then stays flat as tk grows. Newton method, however, only takes one 1.51.5 iteration at each grid point. Moreover, the level of approximation (i.e., ϵ) seems to have no impact onthe number of iterations at each grid point, which is highly desirable.





Algorithm3Primal-DualMethod Initializetheparticles{θi,0}ni=1 andλ0

Neural Information Processing Systems

So we can check that ddtE(qt,λt) (qt,λt) in both cases. Combing the two cases yield the result. Pm i=1N(θ;µi,σ2i) where m is fixed to5 in all the experiments. Monotonic Bayesian Neural Networks In this experiment, we use the COMPAS dataset (J. The task istopredict whether the individual will commit acrime againin2years.




4c4c937b67cc8d785cea1e42ccea185c-Supplemental.pdf

Neural Information Processing Systems

In our method and all the baselines except surrogate-based triage, we use the cross-entropy loss and implement SGD using Adam optimizer [40] with initial learning rate set by cross validation independently foreachmethod andleveloftriageb. Insurrogate-based triage, weusethelossand optimization method used by the authors in their public implementation. Moreover, we use early stopping with the patience parameterep = 10,i.e.,we stop the training process ifno reduction of cross entropy loss is observed on the validation set. This suggests that the humans aremore accurate than thepredictivemodel throughout theentire feature space. This suggests that the humans are less accurate than the predictive model in some regions of the featurespace.