Coresets for Multiple $\ell_p$ Regression
Woodruff, David P., Yasuda, Taisuke
A coreset of a dataset with $n$ examples and $d$ features is a weighted subset of examples that is sufficient for solving downstream data analytic tasks. Nearly optimal constructions of coresets for least squares and $\ell_p$ linear regression with a single response are known in prior work. However, for multiple $\ell_p$ regression where there can be $m$ responses, there are no known constructions with size sublinear in $m$. In this work, we construct coresets of size $\tilde O(\varepsilon^{-2}d)$ for $p<2$ and $\tilde O(\varepsilon^{-p}d^{p/2})$ for $p>2$ independently of $m$ (i.e., dimension-free) that approximate the multiple $\ell_p$ regression objective at every point in the domain up to $(1\pm\varepsilon)$ relative error. If we only need to preserve the minimizer subject to a subspace constraint, we improve these bounds by an $\varepsilon$ factor for all $p>1$. All of our bounds are nearly tight. We give two application of our results. First, we settle the number of uniform samples needed to approximate $\ell_p$ Euclidean power means up to a $(1+\varepsilon)$ factor, showing that $\tilde\Theta(\varepsilon^{-2})$ samples for $p = 1$, $\tilde\Theta(\varepsilon^{-1})$ samples for $1 < p < 2$, and $\tilde\Theta(\varepsilon^{1-p})$ samples for $p>2$ is tight, answering a question of Cohen-Addad, Saulpic, and Schwiegelshohn. Second, we show that for $1
Jun-4-2024
- Country:
- Asia
- Afghanistan > Parwan Province
- Charikar (0.04)
- India > Karnataka
- Bengaluru (0.04)
- Middle East > Israel
- Jerusalem District > Jerusalem (0.04)
- Afghanistan > Parwan Province
- Europe
- France > Île-de-France
- Italy
- Spain > Catalonia
- Barcelona Province > Barcelona (0.04)
- Switzerland
- Basel-City > Basel (0.04)
- Zürich > Zürich (0.14)
- United Kingdom > England
- Cambridgeshire > Cambridge (0.04)
- North America > United States
- Pennsylvania > Allegheny County
- Pittsburgh (0.04)
- California
- Alameda County > Berkeley (0.04)
- San Diego County > San Diego (0.04)
- Santa Clara County
- Colorado
- Boulder County > Boulder (0.04)
- Denver County > Denver (0.04)
- Washington > King County
- Seattle (0.14)
- Illinois > Cook County
- Chicago (0.04)
- Oregon > Multnomah County
- Portland (0.04)
- Hawaii > Honolulu County
- Honolulu (0.04)
- Arizona > Maricopa County
- Phoenix (0.04)
- Florida
- Miami-Dade County > Miami (0.04)
- Orange County > Orlando (0.04)
- Pennsylvania > Allegheny County
- Asia
- Genre:
- Research Report > New Finding (0.34)
- Technology: