Goto

Collaborating Authors

 Mathematical & Statistical Methods


Epidemic Learning: Boosting Decentralized Learning with Randomized Communication

Neural Information Processing Systems

We present Epidemic Learning (EL), a simple yet powerful decentralized learning (DL) algorithm that leverages changing communication topologies to achieve faster model convergence compared to conventional DL approaches. At each round of EL, each node sends its model updates to a random sample of s other nodes (in a system of n nodes). We provide an extensive theoretical analysis of EL, demonstrating that its changing topology culminates in superior convergence properties compared to the state-of-the-art (static and dynamic) topologies. Considering smooth nonconvex loss functions, the number of transient iterations for EL, i.e., the rounds required to achieve asymptotic linear speedup, is in O(


Asymptotic Guarantees for Learning Generative Models with the Sliced-Wasserstein Distance

Neural Information Processing Systems

Minimum expected distance estimation (MEDE) algorithms have been widely used for probabilistic models with intractable likelihood functions and they have become increasingly popular due to their use in implicit generative modeling (e.g.


Necessary and Sufficient Geometries for Gradient Methods

Neural Information Processing Systems

We study the impact of the constraint set and gradient geometry on the convergence of online and stochastic methods for convex optimization, providing a characterization of the geometries for which stochastic gradient and adaptive gradient methods are (minimax) optimal. In particular, we show that when the constraint set is quadratically convex, diagonally pre-conditioned stochastic gradient methods are minimax optimal.



Parameter Symmetry and Noise Equilibrium of Stochastic Gradient Descent Mingze Wang Massachusetts Institute of Technology, Peking University NTT Research

Neural Information Processing Systems

Symmetries are prevalent in deep learning and can significantly influence the learning dynamics of neural networks. In this paper, we examine how exponential symmetries - a broad subclass of continuous symmetries present in the model architecture or loss function - interplay with stochastic gradient descent (SGD). We first prove that gradient noise creates a systematic motion (a "Noether flow") of the parameters θ along the degenerate direction to a unique initializationindependent fixed point θ


Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

Neural Information Processing Systems

We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with n component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical approximations to Langevin dynamics, which leads to faster convergence rates.


Private Edge Density Estimation for Random Graphs: Optimal, Efficient and Robust

Neural Information Processing Systems

We give the first polynomial-time, differentially node-private, and robust algorithm for estimating the edge density of Erdős-Rényi random graphs and their generalization, inhomogeneous random graphs. We further prove information-theoretical lower bounds, showing that the error rate of our algorithm is optimal up to logarithmic factors. Previous algorithms incur either exponential running time or suboptimal error rates. Two key ingredients of our algorithm are (1) a new sum-of-squares algorithm for robust edge density estimation, and (2) the reduction from privacy to robustness based on sum-of-squares exponential mechanisms due to Hopkins et al. (STOC 2023).


I Background in Linear Algebra

Neural Information Processing Systems

In this section we state some elementary results that we will use for our main proofs. I.1 Johnson-Lindenstrauss and subspace embeddings A useful definition for our proofs is the JL moment property, which bounds the moments of the length of Sx. We mention a corollary from [40] which states that JLTs also preserve pairwise angles, which is an important by-product that we will use in our proofs. The next Lemma is part of the proof of [44, Lemma 4.2], which we state here as a separate result to save some space from the longer proofs that follow later. Lemma 4. Let S be a (ϵ, δ)-OSE for a d k matrix U This is part of the proof of [44, Lemma 4.2].



(Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs

Neural Information Processing Systems

We give the first efficient algorithms proven to succeed in the correlated Erdös-Rényi model (Pedarsani and Grossglauser, 2011). Specifically, we give a polynomial time algorithm for the graph similarity/hypothesis testing task which works for every constant level of correlation between the two graphs that can be arbitrarily close to zero.