krr
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > California (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (2 more...)
- North America > United States > New York > New York County > New York City (0.14)
- Asia > China > Beijing > Beijing (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- (3 more...)
- North America > United States > Arizona (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- North America > Canada (0.04)
- (2 more...)
- North America > Canada > Ontario > Toronto (0.14)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)
- Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)
- (2 more...)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- North America > Canada (0.04)
- North America > Canada > Ontario > Hamilton (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland > Basel-City > Basel (0.04)
- Asia > China (0.04)
Provable and Efficient Dataset Distillation for Kernel Ridge Regression
Deep learning models are now trained on increasingly larger datasets, making it crucial to reduce computational costs and improve data quality. Dataset distillation aims to distill a large dataset into a small synthesized dataset such that models trained on it can achieve similar performance to those trained on the original dataset. While there have been many empirical efforts to improve dataset distillation algorithms, a thorough theoretical analysis and provable, efficient algorithms are still lacking. In this paper, by focusing on dataset distillation for kernel ridge regression (KRR), we show that one data point per class is already necessary and sufficient to recover the original model's performance in many settings. For linear ridge regression and KRR with surjective feature mappings, we provide necessary and sufficient conditions for the distilled dataset to recover the original model's parameters. For KRR with injective feature mappings of deep neural networks, we show that while one data point per class is not sufficient in general, $k+1$ data points can be sufficient for deep linear neural networks, where $k$ is the number of classes. Our theoretical results enable directly constructing analytical solutions for distilled datasets, resulting in a provable and efficient dataset distillation algorithm for KRR. We verify our theory experimentally and show that our algorithm outperforms previous work such as KIP while being significantly more efficient, e.g.
On the Saturation Effects of Spectral Algorithms in Large Dimensions
The saturation effects, which originally refer to the fact that kernel ridge regression (KRR) fails to achieve the information-theoretical lower bound when the regression function is over-smooth, have been observed for almost 20 years and were rigorously proved recently for kernel ridge regression and some other spectral algorithms over a fixed dimensional domain. The main focus of this paper is to explore the saturation effects for a large class of spectral algorithms (including the KRR, gradient descent, etc.) in large dimensional settings where $n \asymp d^{\gamma}$. More precisely, we first propose an improved minimax lower bound for the kernel regression problem in large dimensional settings and show that the gradient flow with early stopping strategy will result in an estimator achieving this lower bound (up to a logarithmic factor). Similar to the results in KRR, we can further determine the exact convergence rates (both upper and lower bounds) of a large class of (optimal tuned) spectral algorithms with different qualification $\tau$'s. In particular, we find that these exact rate curves (varying along $\gamma$) exhibit the periodic plateau behavior and the polynomial approximation barrier. Consequently, we can fully depict the saturation effects of the spectral algorithms and reveal a new phenomenon in large dimensional settings (i.e., the saturation effect occurs in large dimensional setting as long as the source condition $s> \tau$ while it occurs in fixed dimensional setting as long as $s> 2\tau$).