Time/Accuracy Tradeoffs for Learning a ReLU with respect to Gaussian Marginals

Surbhi Goel, Sushrut Karmalkar, Adam Klivans

Jan-21-2025, 08:28:25 GMT–Neural Information Processing Systems

We consider the problem of computing the best-fitting ReLU with respect to square-loss on a training set when the examples have been drawn according to a spherical Gaussian distribution (the labels can be arbitrary). Let opt < 1 be the population loss of the best-fitting ReLU. We prove: Finding a ReLU with square-loss opt+ϵ is as hard as the problem of learning sparse parities with noise, widely thought to be computationally intractable. This is the first hardness result for learning a ReLU with respect to Gaussian marginals, and our results imply -unconditionally-that gradient descent cannot converge to the global minimum in polynomial time.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Jan-21-2025, 08:28:25 GMT

Conferences PDF

Add feedback

Country:
- North America
  - Canada (0.46)
  - United States (0.47)

Genre:
- Research Report > New Finding (0.48)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)