Reviews: Gradient Dynamics of Shallow Univariate ReLU Networks

Jan-22-2025, 05:18:10 GMT–Neural Information Processing Systems

In this paper, the authors are using a neural network with one hidden layer and one output layer. The activation function is ReLU, and each hidden unit also has a bias term that can be trained. The input is 1-dimensional, i.e., a scalar, so the task is equivalent to interpolation. The loss function is square loss, and the authors use gradient descent to learn the network weights. The authors use an over-parametrization scheme where the number of nodes in the hidden layer tends to infinity, which means the network is infinitely wide. There are two main learning schemes mentioned in this paper: Kernel and adaptive.

adaptive scheme, gradient dynamic, shallow univariate relu network, (8 more...)

Neural Information Processing Systems

Jan-22-2025, 05:18:10 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)