Mathematical & Statistical Methods
Epidemic Learning: Boosting Decentralized Learning with Randomized Communication
We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of s other nodes (in a system of n nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth nonconvex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in O(
Necessary and Sufficient Geometries for Gradient Methods
We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal.
Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent Mingze Wang Massachusetts Institute of Technology, Peking University NTT Research
Symmetries are prevalent in deep learning and can significantly influence the learning dynamics of neural networks. In this paper, we examine how exponential symmetries - a broad subclass of continuous symmetries present in the model architecture or loss function - interplay with stochastic gradient descent (SGD). We first prove that gradient noise creates a systematic motion (a "Noether flow") of the parameters θ along the degenerate direction to a unique initializationindependent fixed point θ
Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization
Pan Xu, Jinghui Chen, Difan Zou, Quanquan Gu
We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with n component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates.
Private Edge Density Estimation for Random Graphs: Optimal, Efficient and Robust
We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erdős-Rényi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates. Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).
I Background in Linear Algebra
In this section we state some elementary results that we will use for our main proofs. I.1 Johnson-Lindenstrauss and subspace embeddings A useful definition for our proofs is the JL moment property, which bounds the moments of the length of Sx. We mention a corollary from [40] which states that JLTs also preserve pairwise angles, which is an important by-product that we will use in our proofs. The next Lemma is part of the proof of [44, Lemma 4.2], which we state here as a separate result to save some space from the longer proofs that follow later. Lemma 4. Let S be a (ϵ, δ)-OSE for a d k matrix U This is part of the proof of [44, Lemma 4.2].
(Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs
Boaz Barak, Chi-Ning Chou, Zhixian Lei, Tselil Schramm, Yueqi Sheng
We give the first efficient algorithms proven to succeed in the correlated Erdös-Rényi model (Pedarsani and Grossglauser, 2011). Specifically, we give a polynomial time algorithm for the graph similarity/hypothesis testing task which works for every constant level of correlation between the two graphs that can be arbitrarily close to zero.