Goto

Collaborating Authors

 polynomial-time learnability


Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

Neural Information Processing Systems

We consider the problem of learning function classes computed by neural networks with various activations (e.g. ReLU or Sigmoid), a task believed to be computationally intractable in the worst-case. A major open problem is to understand the minimal assumptions under which these classes admit provably efficient algorithms. In this work we show that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g.


Reviews: Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

Neural Information Processing Systems

This paper considers the problem of square-loss regression for several function classes computed by neural networks. In contrast to what previous hardness results might suggest, they show that those function classes can be improperly learnt under marginal distribution assumptions alone. Specifically, they use previous structural results of approximate RKHS embedding and assume an eigenvalue decay assumption on the empirical gram matrix (namely, it is approximately low-rank). They then apply a recursive Nystrom sampling method to obtain an approximate and compressed version of the gram matrix, which is then used instead in the learning. The error of the resulting regression is then bounded using known compression generalization bounds, hence obtaining generalization bounds for kernelized regression using Compression Schemes.


Eigenvalue Decay Implies Polynomial-Time Learnability for Neural Networks

Neural Information Processing Systems

We consider the problem of learning function classes computed by neural networks with various activations (e.g. ReLU or Sigmoid), a task believed to be computationally intractable in the worst-case. A major open problem is to understand the minimal assumptions under which these classes admit provably efficient algorithms. In this work we show that a natural distributional assumption corresponding to {\em eigenvalue decay} of the Gram matrix yields polynomial-time algorithms in the non-realizable setting for expressive classes of networks (e.g. We make no assumptions on the structure of the network or the labels.