learning neural network
Computational Complexity of Learning Neural Networks: Smoothness and Degeneracy
Understanding when neural networks can be learned efficientlyis a fundamental question in learning theory.Existing hardness results suggest that assumptions on both the input distribution and the network's weights are necessary for obtaining efficient algorithms. Moreover, it was previously shown that depth-$2$ networks can be efficiently learned under the assumptions that the input distribution is Gaussian, and the weight matrix is non-degenerate. In this work, we study whether such assumptions may suffice for learning deeper networks and prove negative results. We show that learning depth-$3$ ReLU networks under the Gaussian input distribution is hard even in the smoothed-analysis framework, where a random noise is added to the network's parameters. It implies that learning depth-$3$ ReLU networks under the Gaussian distribution is hard even if the weight matrices are non-degenerate. Moreover, we consider depth-$2$ networks, and show hardness of learning in the smoothed-analysis framework, where both the network parameters and the input distribution are smoothed. Our hardness results are under a well-studied assumption on the existence of local pseudorandom generators.
Learning Neural Networks with Adaptive Regularization
Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on weights) whose covariance matrix has a Kronecker product structure. This structure is designed to capture the correlations in neurons through backpropagation. Under the assumption of this Kronecker factorization, the prior encourages neurons to borrow statistical strength from one another.
Hardness of Learning Neural Networks with Natural Weights
Neural networks are nowadays highly successful despite strong hardness results. The existing hardness results focus on the network architecture, and assume that the network's weights are arbitrary. A natural approach to settle the discrepancy is to assume that the network's weights are ``well-behaved and posses some generic properties that may allow efficient learning. This approach is supported by the intuition that the weights in real-world networks are not arbitrary, but exhibit some ''random-like properties with respect to some ''natural distributions. We prove negative results in this regard, and show that for depth-$2$ networks, and many ``natural weights distributions such as the normal and the uniform distribution, most networks are hard to learn. Namely, there is no efficient learning algorithm that is provably successful for most weights, and every input distribution. It implies that there is no generic property that holds with high probability in such random networks and allows efficient learning.
Learning Neural Networks with Adaptive Regularization
Feed-forward neural networks can be understood as a combination of an intermediate representation and a linear hypothesis. While most previous works aim to diversify the representations, we explore the complementary direction by performing an adaptive and data-dependent regularization motivated by the empirical Bayes method. Specifically, we propose to construct a matrix-variate normal prior (on weights) whose covariance matrix has a Kronecker product structure. This structure is designed to capture the correlations in neurons through backpropagation. Under the assumption of this Kronecker factorization, the prior encourages neurons to borrow statistical strength from one another.
Hardness of Learning Neural Networks under the Manifold Hypothesis
The manifold hypothesis presumes that high-dimensional data lies on or near a low-dimensional manifold. While the utility of encoding geometric structure has been demonstrated empirically, rigorous analysis of its impact on the learnability of neural networks is largely missing. Several recent results have established hardness results for learning feedforward and equivariant neural networks under i.i.d. In this paper, we investigate the hardness of learning under the manifold hypothesis. We ask, which minimal assumptions on the curvature and regularity of the manifold, if any, render the learning problem efficiently learnable.
Reviews: Learning Neural Networks with Adaptive Regularization
Statistical Strength Throughoutt the paper, you refer to the concept of'statistical strength' without describing what it actually means. I expect it means that if two things are correlated, you can estimate properties of them with better sample efficiency if you take this correlation into account, since you're effectively getting more data. Given that two features are correlated, optimization will be improved if you do some sort of preconditioning that accounts for this structure. In other words, given that features are correlated, you want to'share statistical strength.' However, it is less clear to me why you want to regularize the model such that things become correlated/anti-correlated.
Review for NeurIPS paper: Hardness of Learning Neural Networks with Natural Weights
Additional Feedback: In this paper the authors consider the following question: what is an important aspect of a neural network that makes it work well in practice when it's known to be theoretically hard to learn neural networks. There have been many prior works in this direction but most works have proved hardness of learning neural networks under strange architectures or distributions. In this paper their main contribution is to look at "natural" distributions which include for example the uniform distribution and normal distribution and show hardness of learning neural networks even under such natural distributions. Given how important neural networks are in practice, understanding their theoretical underpinning is an important question in ML and this paper shows yet another direction in which it is hard to learn NNs. The main selling point is that prior works looked at arbitrary weights, solely with the purpose of proving lower bounds but in this paper the authors look at natural weights (under which one might have expected to prove positive results) and show that even in this setting, NNs are hard to learn.
Review for NeurIPS paper: Hardness of Learning Neural Networks with Natural Weights
The paper received four reviews. Initially, the scores were borderline but more on the side of accepting. The reviews point out that the paper provides a new style of results on a fundamentally important problem, with interesting techniques. However, one reviewer commented on the result itself being unsurprising; two reviewers were critical of the fact that the result depends on a worst-case input distribution; and some concern was expressed about relation to previous work. The reply from the authors was considered by the reviewers mainly satisfactory, but only partially so regarding the worst-case input distribution.