On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective

Mar-18-2020, 21:31:21 GMT–Neural Information Processing Systems

We consider training over-parameterized two-layer neural networks with Rectified Linear Unit (ReLU) using gradient descent (GD) method. Inspired by a recent line of work, we study the evolutions of network prediction errors across GD iterations, which can be neatly described in a matrix form. When the network is sufficiently over-parameterized, these matrices individually approximate {\em an} integral operator which is determined by the feature vector distribution $\rho$ only. Consequently, GD method can be viewed as {\em approximately} applying the powers of this integral operator on the underlying/target function $f *$ that generates the responses/labels. We show that if $f *$ admits a low-rank approximation with respect to the eigenspaces of this integral operator, then the empirical risk decreases to this low rank approximation error at a linear rate which is determined by $f *$ and $\rho$ only, i.e., the rate is independent of the sample size $n$.

artificial intelligence, learning over-parameterized neural network, machine learning, (5 more...)

Neural Information Processing Systems

Mar-18-2020, 21:31:21 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (0.67)
  - Statistical Learning (0.61)