eigensystem
Predicting kernel regression learning curves from only raw data statistics
Karkada, Dhruva, Turnbull, Joseph, Liu, Yuxi, Simon, James B.
We study kernel regression with common rotation-invariant kernels on real datasets including CIFAR-5m, SVHN, and ImageNet. We give a theoretical framework that predicts learning curves (test risk vs. sample size) from only two measurements: the empirical data covariance matrix and an empirical polynomial decomposition of the target function $f_*$. The key new idea is an analytical approximation of a kernel's eigenvalues and eigenfunctions with respect to an anisotropic data distribution. The eigenfunctions resemble Hermite polynomials of the data, so we call this approximation the Hermite eigenstructure ansatz (HEA). We prove the HEA for Gaussian data, but we find that real image data is often "Gaussian enough" for the HEA to hold well in practice, enabling us to predict learning curves by applying prior results relating kernel eigenstructure to test risk. Extending beyond kernel regression, we empirically find that MLPs in the feature-learning regime learn Hermite polynomials in the order predicted by the HEA. Our HEA framework is a proof of concept that an end-to-end theory of learning which maps dataset structure all the way to model performance is possible for nontrivial learning algorithms on real datasets.
- North America > United States > Rhode Island > Providence County > Providence (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > West Midlands > Birmingham (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
Toward Large Kernel Models
Abedsoltan, Amirhesam, Belkin, Mikhail, Pandit, Parthe
Recent studies indicate that kernel machines can often perform similarly or better than deep neural networks (DNNs) on small datasets. The interest in kernel machines has been additionally bolstered by the discovery of their equivalence to wide neural networks in certain regimes. However, a key feature of DNNs is their ability to scale the model size and training data size independently, whereas in traditional kernel machines model size is tied to data size. Because of this coupling, scaling kernel machines to large data has been computationally challenging. In this paper, we provide a way forward for constructing large-scale general kernel models, which are a generalization of kernel machines that decouples the model and data, allowing training on large datasets. Specifically, we introduce EigenPro 3.0, an algorithm based on projected dual preconditioned SGD and show scaling to model and data sizes which have not been possible with existing kernel methods.
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > India (0.04)
Stochastic Optimization for Kernel PCA
Zhang, Lijun (Nanjing University) | Yang, Tianbao (University of Iowa) | Yi, Jinfeng (IBM Thomas J. Watson Research Center) | Jin, Rong (Alibaba Group) | Zhou, Zhi-Hua (Nanjing University)
Kernel Principal Component Analysis (PCA) is a popular extension of PCA which is able to find nonlinear patterns from data. However, the application of kernel PCA to large-scale problems remains a big challenge, due to its quadratic space complexity and cubic time complexity in the number of examples. To address this limitation, we utilize techniques from stochastic optimization to solve kernel PCA with linear space and time complexities per iteration. Specifically, we formulate it as a stochastic composite optimization problem, where a nuclear norm regularizer is introduced to promote low-rankness, and then develop a simple algorithm based on stochastic proximal gradient descent. During the optimization process, the proposed algorithm always maintains a low-rank factorization of iterates that can be conveniently held in memory. Compared to previous iterative approaches, a remarkable property of our algorithm is that it is equipped with an explicit rate of convergence. Theoretical analysis shows that the solution of our algorithm converges to the optimal one at an O(1/T) rate, where T is the number of iterations.
- Asia > China > Jiangsu Province > Nanjing (0.04)
- South America > Paraguay > Asunción > Asunción (0.04)
- North America > United States > Iowa > Johnson County > Iowa City (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Learning Eigenvectors for Free
Koolen, Wouter M., Kotlowski, Wojciech, Warmuth, Manfred K.
We extend the classical problem of predicting a sequence of outcomes from a finite alphabet to the matrix domain. In this extension, the alphabet of $n$ outcomes is replaced by the set of all dyads, i.e. outer products $\u\u^\top$ where $\u$ is a vector in $\R^n$ of unit length. Whereas in the classical case the goal is to learn (i.e. sequentially predict as well as) the best multinomial distribution, in the matrix case we desire to learn the density matrix that best explains the observed sequence of dyads. We show how popular online algorithms for learning a multinomial distribution can be extended to learn density matrices. Intuitively, learning the $n^2$ parameters of a density matrix is much harder than learning the $n$ parameters of a multinomial distribution. Completely surprisingly, we prove that the worst-case regrets of certain classical algorithms and their matrix generalizations are identical. The reason is that the worst-case sequence of dyads share a common eigensystem, i.e. the worst case regret is achieved in the classical case. So these matrix algorithms learn the eigenvectors without any regret.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Oceania > Australia > Australian Capital Territory > Canberra (0.05)
- North America > United States > New York (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > Australian Capital Territory > Canberra (0.05)
- North America > United States > New York (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Asia > Middle East > Jordan (0.04)
- Oceania > Australia > Australian Capital Territory > Canberra (0.05)
- North America > United States > New York (0.05)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)
- Asia > Middle East > Jordan (0.04)