Gradient Descent
Gradient Sparsification for Communication-Efficient Distributed Optimization
Jianqiao Wangni, Jialei Wang, Ji Liu, Tong Zhang
In the synchronous stochastic gradient method, each worker processes a random minibatch of its training data, and then the local updates are synchronized by making anAll-Reduce step, which aggregates stochastic gradients from all workers, and taking aBroadcast step that transmits the updated parameter vector back toallworkers.
Toward Better PAC-Bayes Bounds for Uniformly Stable Algorithms Yunwen Lei 2
We give sharper bounds for uniformly stable randomized algorithms in a PAC-Bayesian framework, which improve the existing results by up to a factor of n (ignoring a log factor), where n is the sample size. The key idea is to bound the moment generating function of the generalization gap using concentration of weakly dependent random variables due to Bousquet et al (2020). We introduce an assumption of sub-exponential stability parameter, which allows a general treatment that we instantiate in two applications: stochastic gradient descent and randomized coordinate descent. Our results eliminate the requirement of strong convexity from previous results, and hold for non-smooth convex problems.