Most Neural Networks Are Almost Learnable

Daniely, Amit, Srebro, Nathan, Vardi, Gal

Oct-24-2023–arXiv.org Machine Learning

One of the greatest mysteries surrounding deep learning is the discrepancy between its phenomenal capabilities in practice and the fact that despite a great deal of research, polynomial-time algorithms for learning deep models are known only for very restrictive cases. Indeed, state of the art results are only capable of dealing with two-layer networks under assumptions on the input distribution and the network's weights. Furthermore, theoretical study shows that even with very naive architectures, learning neural networks is worst-case computationally intractable. In this paper, we contrast the aforementioned theoretical state of affairs, and show that, perhaps surprisingly, even though constant-depth networks are completely out of reach from a worst-case perspective, most of them are not as hard as one would imagine. That is, they are distribution-free learnable in polynomial time up to any desired constant accuracy. This is the first polynomial-time approximation scheme (PTAS) for learning neural networks of depth greater than 2 (see the related work section for more details). Moreover, we show that the standard SGD algorithm on a ReLU network can be used as a PTAS for learning random networks. The question of whether learning random networks can be done efficiently was posed by Daniely et al. [15], and our work provides a positive result in that respect.

artificial intelligence, machine learning, neural network, (18 more...)

arXiv.org Machine Learning

Oct-24-2023

arXiv.org PDF

Add feedback

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found