Not enough data to create a plot.
Try a different view from the menu above.
Zeyuan Allen-Zhu
Even Faster SVD Decomposition Yet Without Agonizing Pain
Zeyuan Allen-Zhu, Yuanzhi Li
We study k-SVD that is to obtain the first k singular vectors of a matrix A. Recently, a few breakthroughs have been discovered on k-SVD: Musco and Musco [19] proved the first gap-free convergence result using the block Krylov method, Shamir [21] discovered the first variance-reduction stochastic method, and Bhojanapalli et al. [7] provided the fastest O(nnz(A) + poly(1/ฮต))-time algorithm using alternating minimization. In this paper, we put forward a new and simple LazySVD framework to improve the above breakthroughs. This framework leads to a faster gap-free method outperforming [19], and the first accelerated and stochastic method outperforming [21]. In the O(nnz(A) + poly(1/ฮต)) running-time regime, LazySVD outperforms [7] in certain parameter regimes without even using alternating minimization.
Exploiting the Structure: Stochastic Gradient Methods Using Raw Clusters
Zeyuan Allen-Zhu, Yang Yuan, Karthik Sridharan
The amount of data available in the world is growing faster than our ability to deal with it. However, if we take advantage of the internal structure, data may become much smaller for machine learning purposes. In this paper we focus on one of the fundamental machine learning tasks, empirical risk minimization (ERM), and provide faster algorithms with the help from the clustering structure of the data. We introduce a simple notion of raw clustering that can be efficiently computed from the data, and propose two algorithms based on clustering information. Our accelerated algorithm ClusterACDM is built on a novel Haar transformation applied to the dual space of the ERM problem, and our variance-reduction based algorithm ClusterSVRG introduces a new gradient estimator using clustering. Our algorithms outperform their classical counterparts ACDM and SVRG respectively.
Optimal Black-Box Reductions Between Optimization Objectives
Zeyuan Allen-Zhu, Elad Hazan
The diverse world of machine learning applications has given rise to a plethora of algorithms and optimization methods, finely tuned to the specific regression or classification task at hand. We reduce the complexity of algorithm design for machine learning by reductions: we develop reductions that take a method developed for one setting and apply it to the entire spectrum of smoothness and strong-convexity in applications. Furthermore, unlike existing results, our new reductions are optimal and more practical. We show how these new reductions give rise to new and faster running times on training linear classifiers for various families of loss functions, and conclude with experiments showing their successes also in practice.
Can SGD Learn Recurrent Neural Networks with Provable Generalization?
Zeyuan Allen-Zhu, Yuanzhi Li
Recurrent Neural Networks (RNNs) are among the most popular models in sequential data analysis. Yet, in the foundational PAC learning language, what concept class can it learn? Moreover, how can the same recurrent unit simultaneously learn functions from different input tokens to different output tokens, without affecting each other?
What Can ResNet Learn Efficiently, Going Beyond Kernels?
Zeyuan Allen-Zhu, Yuanzhi Li
How can neural networks such as ResNet efficiently learn CIFAR-10 with test accuracy more than 96%, while other methods, especially kernel methods, fall relatively behind? Can we more provide theoretical justifications for this gap? Recently, there is an influential line of work relating neural networks to kernels in the over-parameterized regime, proving they can learn certain concept class that is also learnable by kernels with similar test error. Yet, can neural networks provably learn some concept class better than kernels? We answer this positively in the distribution-free setting.
NEON2: Finding Local Minima via First-Order Oracles
Zeyuan Allen-Zhu, Yuanzhi Li
Is Q-Learning Provably Efficient?
Chi Jin, Zeyuan Allen-Zhu, Sebastien Bubeck, Michael I. Jordan
Natasha 2: Faster Non-Convex Optimization Than SGD
Zeyuan Allen-Zhu