Oceania
Y our representations are in the network: composable and parallel adaptation for large scale models
On the ViT -L/16 architecture, our experiments show that a single adapter, 1.3% of the full model, is able to reach full fine-tuning accuracy on average across 11 challenging downstream classification tasks. Compared with other forms of parameter-efficient adaptation, the isolated nature of the InCA adaptation is computationally desirable for large-scale models. For instance, we adapt ViT -G/14 (1.8B+ parameters) quickly with 20+ adapters in parallel on a single V100 GPU (76% GPU memory reduction) and exhaustively identify its
SPIDER: Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator
Cong Fang, Chris Junchi Li, Zhouchen Lin, Tong Zhang
We provide a few error-bound results on its convergence rates. Specially, we prove that theSPIDER-SFO algorithm achieves a gradient computation cost of O min(n1/2 2, 3) to find an -approximate first-order stationary point. In addition, we prove thatSPIDER-SFO nearly matches the algorithmic lower bound for finding stationary point under the gradient Lipschitz assumption in the finite-sum setting.
Empirical Risk Minimization in Non-interactive Local Differential Privacy Revisited
Di Wang, Marco Gaboardi, Jinhui Xu
In this paper, we revisit the Empirical Risk Minimization problem in the noninteractive local model of differential privacy. In the case of constant or low dimensions (pn), we first show that if the loss function is(,T)-smooth, wecanavoidadependence ofthesample complexity,toachieveerrorฮฑ,onthe exponential of the dimensionalityp with base1/ฮฑ (i.e.,ฮฑ p), which answers a questionin[19].