Better scalability under potentially heavy-tailed feedback

Dec-14-2020–arXiv.org Machine Learning

We study scalable alternatives to robust gradient descent (RGD) techniques that can be used when the losses and/or gradients can be heavy-tailed, though this will be unknown to the learner. The core technique is simple: instead of trying to robustly aggregate gradients at each step, which is costly and leads to sub-optimal dimension dependence in risk bounds, we instead focus computational effort on robustly choosing (or newly constructing) a strong candidate based on a collection of cheap stochastic sub-processes which can be run in parallel. The exact selection process depends on the convexity of the underlying objective, but in all cases, our selection technique amounts to a robust form of boosting the confidence of weak learners. In addition to formal guarantees, we also provide empirical analysis of robustness to perturbations to experimental conditions, under both sub-Gaussian and heavy-tailed data, along with applications to a variety of benchmark datasets. The overall take-away is an extensible procedure that is simple to implement, trivial to parallelize, which keeps the formal merits of RGD methods but scales much better to large learning problems.

convexity, excess risk, procedure, (14 more...)

arXiv.org Machine Learning

Dec-14-2020

arXiv.org PDF

Add feedback

Country:
- North America > Canada
  - Ontario > Toronto (0.04)
- Europe > United Kingdom
  - England
    - Cambridgeshire > Cambridge (0.04)
    - Oxfordshire > Oxford (0.04)
- Asia
  - Middle East > Israel
    - Jerusalem District > Jerusalem (0.04)
  - Japan > Honshū
    - Kansai > Osaka Prefecture > Osaka (0.04)

Genre:
- Research Report > New Finding (0.45)

Industry:
- Education (0.65)

Technology:
- Information Technology > Artificial Intelligence
  - Representation & Reasoning (0.93)
  - Machine Learning > Statistical Learning
    - Gradient Descent (0.35)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found