Goto

Collaborating Authors

 Oceania


A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication

Neural Information Processing Systems

Algorithm Thei Requirinitialx0,i, 1: forj =0 ,1,2,..., 1do 2: Randomlymtraining 3: Compute 4: Update 5: if((j+ 1)p)=0 then 6: Compute 7: Quantize 8: Av 9: Update 10: end 11: end Inthe achie O(1/ p MK)con limited impair gradient 2-bit ratio 32/2 =(if We the communicate issho each parameters.


Y our representations are in the network: composable and parallel adaptation for large scale models

Neural Information Processing Systems

On the ViT -L/16 architecture, our experiments show that a single adapter, 1.3% of the full model, is able to reach full fine-tuning accuracy on average across 11 challenging downstream classification tasks. Compared with other forms of parameter-efficient adaptation, the isolated nature of the InCA adaptation is computationally desirable for large-scale models. For instance, we adapt ViT -G/14 (1.8B+ parameters) quickly with 20+ adapters in parallel on a single V100 GPU (76% GPU memory reduction) and exhaustively identify its




SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator

Neural Information Processing Systems

We provide a few error-bound results on its convergence rates. Specially, we prove that theSPIDER-SFO algorithm achieves a gradient computation cost of O min(n1/2 2, 3) to find an -approximate first-order stationary point. In addition, we prove thatSPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting.




Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited

Neural Information Processing Systems

In this paper, we revisit the Empirical Risk Minimization problem in the noninteractive local model of differential privacy. In the case of constant or low dimensions (pn), we first show that if the loss function is(,T)-smooth, wecanavoidadependence ofthesample complexity,toachieveerrorฮฑ,onthe exponential of the dimensionalityp with base1/ฮฑ (i.e.,ฮฑ p), which answers a questionin[19].