Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Goel, Surbhi, Karmalkar, Sushrut, Klivans, Adam

Nov-4-2019–arXiv.org Machine Learning

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let $\mathsf{opt} < 1$ be the population loss of the best-fitting ReLU. We prove: 1. Finding a ReLU with square-loss $\mathsf{opt} + \epsilon$ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply -{\emph unconditionally}- that gradient descent cannot converge to the global minimum in polynomial time. 2. There exists an efficient approximation algorithm for finding the best-fitting ReLU that achieves error $O(\mathsf{opt}^{2/3})$. The algorithm uses a novel reduction to noisy halfspace learning with respect to $0/1$ loss. Prior work due to Soltanolkotabi [Sol17] showed that gradient descent can find the best-fitting ReLU with respect to Gaussian marginals, if the training set is exactly labeled by a ReLU.

algorithm, parity, relu, (16 more...)

arXiv.org Machine Learning

Nov-4-2019

arXiv.org PDF

Add feedback

Country:
- North America
  - United States
    - Texas (0.04)
    - California > Los Angeles County
      - Los Angeles (0.14)
  - Canada > Quebec
    - Montreal (0.04)
- Europe
  - Sweden > Stockholm
    - Stockholm (0.04)
  - Netherlands > North Holland
    - Amsterdam (0.04)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found