Goto

Collaborating Authors

 xti


Distributed Gradient Clustering: Convergence and the Effect of Initialization

Armacki, Aleksandar, Sharma, Himkant, Bajović, Dragana, Jakovetić, Dušan, Chakraborty, Mrityunjoy, Kar, Soummya

arXiv.org Machine Learning

We study the effects of center initialization on the performance of a family of distributed gradient-based clustering algorithms introduced in [1], that work over connected networks of users. In the considered scenario, each user contains a local dataset and communicates only with its immediate neighbours, with the aim of finding a global clustering of the joint data. We perform extensive numerical experiments, evaluating the effects of center initialization on the performance of our family of methods, demonstrating that our methods are more resilient to the effects of initialization, compared to centralized gradient clustering [2]. Next, inspired by the $K$-means++ initialization [3], we propose a novel distributed center initialization scheme, which is shown to improve the performance of our methods, compared to the baseline random initialization.


A New Kernel Regularity Condition for Distributed Mirror Descent: Broader Coverage and Simpler Analysis

Qiu, Junwen, Zeng, Ziyang, Mei, Leilei, Zhang, Junyu

arXiv.org Machine Learning

Existing convergence of distributed optimization methods in non-Euclidean geometries typically rely on kernel assumptions: (i) global Lipschitz smoothness and (ii) bi-convexity of the associated Bregman divergence function. Unfortunately, these conditions are violated by nearly all kernels used in practice, leaving a huge theory-practice gap. This work closes this gap by developing a unified analytical tool that guarantees convergence under mild conditions. Specifically, we introduce Hessian relative uniform continuity (HRUC), a regularity satisfied by nearly all standard kernels. Importantly, HRUC is closed under concatenation, positive scaling, composition, and various kernel combinations. Leveraging the geometric structure induced by HRUC, we derive convergence guarantees for mirror descent-based gradient tracking without imposing any restrictive assumptions. More broadly, our analysis techniques extend seamlessly to other decentralized optimization methods in genuinely non-Euclidean and non-Lipschitz settings.


A Smoothed Analysis of the Greedy Algorithm for the Linear Contextual Bandit Problem

Sampath Kannan, Jamie H. Morgenstern, Aaron Roth, Bo Waggoner, Zhiwei Steven Wu

Neural Information Processing Systems

Wegiveasmoothed analysis, showing that evenwhen contexts may be chosen by an adversary, small perturbations of the adversary's choices suffice for the algorithm to achieve "no regret", perhaps (depending on the specifics of the setting) with a constant amount of initial training data.



dececdcbf0ea0162234a8fb4ab051415-Supplemental-Conference.pdf

Neural Information Processing Systems

Thus,γ(ω) (0,1] for ω (0,1], which meets the algorithm design requirement. Algorithm 2 actually performs the gradient descent scheme on the function ˆfti(x) = Eu B[fti(x+ϵu)] restricted to the convex set(1 ζ)K.


OnlineMultitaskLearningwithLong-TermMemory

Neural Information Processing Systems

Associatedwitheach segment is a hypothesis from some hypothesis class. We give algorithms that are designed to exploit the scenario where there are many such segments but significantly fewer associated hypotheses.


All-or-nothingstatisticalandcomputationalphase transitionsinsparsespikedmatrixestimation

Neural Information Processing Systems

Similarly the ISOMAP face database consists ofimages (256levels ofgray)ofsize64 64,i.e.,vectors in R4096, whereas the correct intrinsic dimension is only3 (for the vertical, horizontal pause and lightingdirection). The second approach, is anaverage caseapproach (in the spirit of thestatistical mechanics treatment ofhighdimensional systems), thatmodelsfeaturevectorsby arandom ensemble,taken as aset ofrandom vectors with independently identically distributed (i.i.d.) components, and a small but xed fraction of non-zero components.



OntheConvergenceofStepDecayStep-Sizefor StochasticOptimization

Neural Information Processing Systems

Step decay step-size schedules (constant and then cut) are widely used in practice because of their excellent convergence and generalization qualities, but their theoretical properties are not yet well understood. Weprovide convergence results for step decay in the non-convexregime, ensuring that the gradient norm vanishes at an O(lnT/ T)rate.


Bayesian-guidedLabelMappingforVisual Reprogramming

Neural Information Processing Systems

However, in this paper, we reveal that one-to-one mappings may overlook the complex relationship between pretrained and downstream labels.