Goto

Collaborating Authors

 output neuron



Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy

Neural Information Processing Systems

Understanding when neural networks can be learned efficiently is a fundamental question in learning theory. Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtaining efficient algorithms. Moreover, it was previously shown that depth-2 networks can be efficiently learned under the assumptions that the input distribution is Gaussian, and the weight matrix is non-degenerate. In this work, we study whether such assumptions may suffice for learning deeper networks and prove negative results. We show that learning depth-3 ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework, where a random noise is added to the network's parameters. It implies that learning depth-3 ReLU networks under the Gaussian distribution is hard even if the weight matrices are non-degenerate. Moreover, we consider depth-2networks, and show hardness of learning in the smoothed-analysis framework, where both the network parameters and the input distribution are smoothed. Our hardness results are under a wellstudied assumption on the existence of local pseudorandom generators.




Transition to Linearity of General Neural Networks with Directed Acyclic Graph Architecture

Neural Information Processing Systems

In this paper we show that feedforward neural networks corresponding to arbitrary directed acyclic graphs undergo transition to linearity as their "width" approaches infinity. The width of these general networks is characterized by the minimum indegree of their neurons, except for the input and first layers. Our results identify the mathematical structure underlying transition to linearity and generalize a number of recent works aimed at characterizing transition to linearity or constancy of the Neural Tangent Kernel for standard architectures.


A Missing lemmas for the proof of Theorem 3.1

Neural Information Processing Systems

The following proof is from Daniely and V ardi [15], and we give it here for completeness. By Lemma A.1, there exists a DNF formula We construct such an affine layer in Lemma A.2. At least one of the k size-n slices in z contains 0 more than once. We define the outputs of our affine layer as follows. Pr [z represents a hyperedge ] = n (n 1) ... (n k + 1) null 1 n null Pr null z Z null 1 2 log(n) .