AITopics | learning relus

Learning ReLUs via Gradient Descent

Neural Information Processing SystemsNov-21-2025, 16:09:05 GMT

In this paper we study the problem of learning Rectified Linear Units (ReLUs) which are functions of the form $\vct{x}\mapsto \max(0,\langle \vct{w},\vct{x}\rangle)$ with $\vct{w}\in\R^d$ denoting the weight vector. We study this problem in the high-dimensional regime where the number of observations are fewer than the dimension of the weight vector. We assume that the weight vector belongs to some closed set (convex or nonconvex) which captures known side-information about its structure. We focus on the realizable model where the inputs are chosen i.i.d.~from a Gaussian distribution and the labels are generated according to a planted weight vector. We show that projected gradient descent, when initialized at $\vct{0}$, converges at a linear rate to the planted model with a number of samples that is optimal up to numerical constants. Our results on the dynamics of convergence of these very shallow neural nets may provide some insights towards understanding the dynamics of deeper architectures.

gradient descent, learning relus, name change, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Add feedback

Learning ReLUs via Gradient Descent

Mahdi Soltanolkotabi

Neural Information Processing SystemsNov-21-2025, 13:21:45 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, deep learning, machine learning, (15 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Reviews: Learning ReLUs via Gradient Descent

Neural Information Processing SystemsOct-8-2024, 11:12:40 GMT

The analysis assumes the input samples are iid Gaussian and the output is realizable, i.e. the target value y_k for input x_k is constructed via y_k ReLU(w *.x_k). The paper studies a regression setting and uses least squares as the loss function.

gradient descent, learning relus, review, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.45)

Add feedback

Learning ReLUs via Gradient Descent

Mahdi Soltanolkotabi

Neural Information Processing SystemsOct-4-2024, 08:32:13 GMT

We study this problem in the high-dimensional regime where the number of observations are fewer than the dimension of the weight vector. We assume that the weight vector belongs to some closed set (convex or nonconvex) which captures known side-information about its structure. We focus on the realizable model where the inputs are chosen i.i.d.

arxiv preprint arxiv, probability, sgn, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.28)
North America > United States > California > Los Angeles County > Long Beach (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Learning ReLUs via Gradient Descent

Soltanolkotabi, Mahdi

Neural Information Processing SystemsFeb-14-2020, 09:27:14 GMT

In this paper we study the problem of learning Rectified Linear Units (ReLUs) which are functions of the form $\vct{x}\mapsto \max(0,\langle \vct{w},\vct{x}\rangle)$ with $\vct{w}\in\R d$ denoting the weight vector. We study this problem in the high-dimensional regime where the number of observations are fewer than the dimension of the weight vector. We assume that the weight vector belongs to some closed set (convex or nonconvex) which captures known side-information about its structure. We focus on the realizable model where the inputs are chosen i.i.d. We show that projected gradient descent, when initialized at $\vct{0}$, converges at a linear rate to the planted model with a number of samples that is optimal up to numerical constants.

artificial intelligence, gradient descent, machine learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.65)

Add feedback

Learning ReLUs via Gradient Descent

Soltanolkotabi, Mahdi

Neural Information Processing SystemsDec-31-2017

In this paper we study the problem of learning Rectified Linear Units (ReLUs) which are functions of the form $\vct{x}\mapsto \max(0,\langle \vct{w},\vct{x}\rangle)$ with $\vct{w}\in\R^d$ denoting the weight vector. We study this problem in the high-dimensional regime where the number of observations are fewer than the dimension of the weight vector. We assume that the weight vector belongs to some closed set (convex or nonconvex) which captures known side-information about its structure. We focus on the realizable model where the inputs are chosen i.i.d.~from a Gaussian distribution and the labels are generated according to a planted weight vector. We show that projected gradient descent, when initialized at $\vct{0}$, converges at a linear rate to the planted model with a number of samples that is optimal up to numerical constants. Our results on the dynamics of convergence of these very shallow neural nets may provide some insights towards understanding the dynamics of deeper architectures.

artificial intelligence, machine learning, probability, (14 more...)

Neural Information Processing Systems

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

Learning ReLUs via Gradient Descent

Soltanolkotabi, Mahdi

arXiv.org Machine LearningAug-1-2017

In this paper we study the problem of learning Rectified Linear Units (ReLUs) which are functions of the form $max(0,)$ with $w$ denoting the weight vector. We study this problem in the high-dimensional regime where the number of observations are fewer than the dimension of the weight vector. We assume that the weight vector belongs to some closed set (convex or nonconvex) which captures known side-information about its structure. We focus on the realizable model where the inputs are chosen i.i.d.~from a Gaussian distribution and the labels are generated according to a planted weight vector. We show that projected gradient descent, when initialization at 0, converges at a linear rate to the planted model with a number of samples that is optimal up to numerical constants. Our results on the dynamics of convergence of these very shallow neural nets may provide some insights towards understanding the dynamics of deeper architectures.

artificial intelligence, machine learning, probability, (16 more...)

arXiv.org Machine Learning

1705.04591

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.70)

Technology: