Sharper Bounds for $\ell_p$ Sensitivity Sampling

Jan-3-2024–arXiv.org Machine Learning

In large scale machine learning, random sampling is a popular way to approximate datasets by a small representative subset of examples. In particular, sensitivity sampling is an intensely studied technique which provides provable guarantees on the quality of approximation, while reducing the number of examples to the product of the VC dimension $d$ and the total sensitivity $\mathfrak S$ in remarkably general settings. However, guarantees going beyond this general bound of $\mathfrak S d$ are known in perhaps only one setting, for $\ell_2$ subspace embeddings, despite intense study of sensitivity sampling in prior work. In this work, we show the first bounds for sensitivity sampling for $\ell_p$ subspace embeddings for $p > 2$ that improve over the general $\mathfrak S d$ bound, achieving a bound of roughly $\mathfrak S^{2-2/p}$ for $2

artificial intelligence, machine learning, sensitivity, (17 more...)

arXiv.org Machine Learning

Jan-3-2024

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Nevada (0.04)
    - Massachusetts (0.04)
    - Texas > Travis County
      - Austin (0.04)
    - Utah > Salt Lake County
      - Salt Lake City (0.04)
    - Oregon > Multnomah County
      - Portland (0.04)
    - Illinois > Cook County
      - Chicago (0.04)
    - Virginia > Alexandria County
      - Alexandria (0.04)
    - Colorado
      - Denver County > Denver (0.04)
      - Boulder County > Boulder (0.04)
    - North Carolina > Durham County
      - Durham (0.04)
    - California
      - San Diego County > San Diego (0.04)
      - Santa Clara County > San Jose (0.04)
      - Alameda County > Berkeley (0.04)
    - Pennsylvania > Allegheny County
      - Pittsburgh (0.04)
  - Canada
    - Quebec > Montreal (0.04)
    - British Columbia > Metro Vancouver Regional District
      - Vancouver (0.04)
- Europe
  - United Kingdom > England
    - Cambridgeshire > Cambridge (0.04)
  - Switzerland > Basel-City
    - Basel (0.04)
  - Italy > Lazio
    - Rome (0.04)
  - France > Île-de-France
    - Paris > Paris (0.04)
- Asia
  - Middle East > Israel (0.04)
  - India > Telangana
    - Hyderabad (0.04)
  - Afghanistan > Parwan Province
    - Charikar (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning (0.46)
  - Computational Learning Theory (0.34)