convergence time
Solving Non-smooth Constrained Programs with Lower Complexity than \mathcal{O}(1/\varepsilon) : A Primal-Dual Homotopy Smoothing Approach
We propose a new primal-dual homotopy smoothing algorithm for a linearly constrained convex program, where neither the primal nor the dual function has to be smooth or strongly convex. The best known iteration complexity solving such a non-smooth problem is $\mathcal{O}(\varepsilon^{-1})$. In this paper, we show that by leveraging a local error bound condition on the dual function, the proposed algorithm can achieve a better primal convergence time of $\mathcal{O}\l(\varepsilon^{-2/(2+\beta)}\log_2(\varepsilon^{-1})\r)$, where $\beta\in(0,1]$ is a local error bound parameter. As an example application, we show that the distributed geometric median problem, which can be formulated as a constrained convex program, has its dual function non-smooth but satisfying the aforementioned local error bound condition with $\beta=1/2$, therefore enjoying a convergence time of $\mathcal{O}\l(\varepsilon^{-4/5}\log_2(\varepsilon^{-1})\r)$. This result improves upon the $\mathcal{O}(\varepsilon^{-1})$ convergence time bound achieved by existing distributed optimization algorithms. Simulation experiments also demonstrate the performance of our proposed algorithm.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- North America > United States > Washington > King County > Bellevue (0.04)
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
- (3 more...)
The Convergence Rate of Neural Networks for Learned Functions of Different Frequencies
Basri Ronen, David Jacobs, Yoni Kasten, Shira Kritchman
We study the relationship between the frequency of a function and the speed at which a neural network learns it. We build on recent results that show that the dynamics of overparameterized neural networks trained with gradient descent can bewell approximated byalinear system. When normalized training data is uniformly distributed on ahypersphere, the eigenfunctions of this linear system are spherical harmonic functions.
- North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
- Asia > Middle East > Israel (0.04)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Bonn (0.04)
- Asia > China (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Solving Non-smooth Constrained Programs with Lower Complexity than \mathcal{O}(1/\varepsilon) : A Primal-Dual Homotopy Smoothing Approach
We propose a new primal-dual homotopy smoothing algorithm for a linearly constrained convex program, where neither the primal nor the dual function has to be smooth or strongly convex. The best known iteration complexity solving such a non-smooth problem is $\mathcal{O}(\varepsilon^{-1})$. In this paper, we show that by leveraging a local error bound condition on the dual function, the proposed algorithm can achieve a better primal convergence time of $\mathcal{O}\l(\varepsilon^{-2/(2+\beta)}\log_2(\varepsilon^{-1})\r)$, where $\beta\in(0,1]$ is a local error bound parameter. As an example application, we show that the distributed geometric median problem, which can be formulated as a constrained convex program, has its dual function non-smooth but satisfying the aforementioned local error bound condition with $\beta=1/2$, therefore enjoying a convergence time of $\mathcal{O}\l(\varepsilon^{-4/5}\log_2(\varepsilon^{-1})\r)$. This result improves upon the $\mathcal{O}(\varepsilon^{-1})$ convergence time bound achieved by existing distributed optimization algorithms. Simulation experiments also demonstrate the performance of our proposed algorithm.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > California > Los Angeles County > Santa Monica (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (2 more...)
3dd48ab31d016ffcbf3314df2b3cb9ce-Reviews.html
First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. The paper considers global and local path planning for multiple agents in 2-D with a centralized message-passing algorithm derived from the three-weight version of ADMM, an established algorithm. The contributions are clearly stated in the introduction: The authors decompose global planning optimization into several sub-problems they dub minimizers, which describe various planning objectives that comprise the larger overall problem to be solved. Minimizers are derived for avoiding inter-agent collisions, avoiding collisions with static obstacles, and for maximizing/minimizing kinetic energy or velocity. They also apply their approach to local planning by reformulating joint optimization.
- Asia > Middle East > Republic of Türkiye > Karaman Province > Karaman (0.05)
- North America > United States > Nevada (0.05)
Export Reviews, Discussions, Author Feedback and Meta-Reviews
"NIPS Neural Information Processing Systems 8-11th December 2014, Montreal, Canada",,, "Paper ID:","24" "Title:","Communication Efficient Distributed Machine Learning with the Parameter Server" Current Reviews First provide a summary of the paper, and then address the following criteria: Quality, clarity, originality and significance. This paper presents improvements on a system for large-scale learning known as parameter server. The parameter server is designed to perform reliable distributed machine learning in large-scale industrial systems (1000's of nodes). The architecture is based on a bipartite graph composed by servers and workers. Workers compute gradients based on subsets of the training instances, while servers aggregate the workers' gradients, update the shared parameter vector and redistribute it to the workers for the next iteration.
FedSSG: Expectation-Gated and History-Aware Drift Alignment for Federated Learning
Zhou, Zhanting, Lai, Jinshan, Zhang, Fengchun, Wu, Zeqin, Zhang, Fengli
Non-IID data and partial participation induce client drift and inconsistent local optima in federated learning, causing unstable convergence and accuracy loss. We present FedSSG, a stochastic sampling-guided, history-aware drift alignment method. FedSSG maintains a per-client drift memory that accumulates local model differences as a lightweight sketch of historical gradients; crucially, it gates both the memory update and the local alignment term by a smooth function of the observed/expected participation ratio (a phase-by-expectation signal derived from the server sampler). This statistically grounded gate stays weak and smooth when sampling noise dominates early, then strengthens once participation statistics stabilize, contracting the local-global gap without extra communication. Across CIFAR-10/100 with 100/500 clients and 2-15 percent participation, FedSSG consistently outperforms strong drift-aware baselines and accelerates convergence; on our benchmarks it improves test accuracy by up to a few points (e.g., about +0.9 on CIFAR-10 and about +2.7 on CIFAR-100 on average over the top-2 baseline) and yields about 4.5x faster target-accuracy convergence on average. The method adds only O(d) client memory and a constant-time gate, and degrades gracefully to a mild regularizer under near-IID or uniform sampling. FedSSG shows that sampling statistics can be turned into a principled, history-aware phase control to stabilize and speed up federated training.
- North America > United States > Virginia (0.04)
- Asia > China > Sichuan Province > Chengdu (0.04)