AITopics | relu net

Collaborating Authors

relu net

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Deep ReLU Networks Have Surprisingly Few Activation Patterns

Boris Hanin, David Rolnick

Neural Information Processing SystemsFeb-13-2026, 01:55:55 GMT

Neural Information Processing Systems http://nips.cc/

activation region, neuron, theorem 5, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Boris Hanin

Neural Information Processing SystemsFeb-12-2026, 07:06:42 GMT

In this article, we continue this line of investigation.

artificial intelligence, evgp, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

39555391eb0624a439c5131b1bb8a2e0-AuthorFeedback.pdf

Neural Information Processing SystemsFeb-11-2026, 22:23:32 GMT

dependence, hanin and sellke, miller and hardt, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

Which Neural Net Architectures Give Rise to Exploding and Vanishing Gradients?

Boris Hanin

Neural Information Processing SystemsNov-20-2025, 14:33:09 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, evgp, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > Canada > Quebec > Montreal (0.04)
Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.04)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Deep ReLU Networks Have Surprisingly Few Activation Patterns

Boris Hanin, David Rolnick

Neural Information Processing SystemsOct-3-2025, 06:47:20 GMT

In this article, we attempt to capture the difference between the maximum complexity of deep networks and the complexity of functions that are actually learned (see Figure 1).

activation region, neuron, theorem 5, (16 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Brazos County > College Station (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > Canada (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

39555391eb0624a439c5131b1bb8a2e0-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 13:24:20 GMT

artificial intelligence, hanin and sellke, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback

Half-Space Feature Learning in Neural Networks

Yadav, Mahesh Lorik, Ramaswamy, Harish Guruprasad, Lakshminarayanan, Chandrashekar

arXiv.org Artificial IntelligenceApr-5-2024

There currently exist two extreme viewpoints for neural network feature learning -- (i) Neural networks simply implement a kernel method (a la NTK) and hence no features are learned (ii) Neural networks can represent (and hence learn) intricate hierarchical features suitable for the data. We argue in this paper neither interpretation is likely to be correct based on a novel viewpoint. Neural networks can be viewed as a mixture of experts, where each expert corresponds to a (number of layers length) path through a sequence of hidden units. We use this alternate interpretation to motivate a model, called the Deep Linearly Gated Network (DLGN), which sits midway between deep linear networks and ReLU networks. Unlike deep linear networks, the DLGN is capable of learning non-linear features (which are then linearly combined), and unlike ReLU networks these features are ultimately simple -- each feature is effectively an indicator function for a region compactly described as an intersection of (number of layers) half-spaces in the input space. This viewpoint allows for a comprehensive global visualization of features, unlike the local visualizations for neurons based on saliency/activation/gradient maps. Feature learning in DLGNs is shown to happen and the mechanism with which this happens is through learning half-spaces in the input space that contain smooth regions of the target function. Due to the structure of DLGNs, the neurons in later layers are fundamentally the same as those in earlier layers -- they all represent a half-space -- however, the dynamics of gradient descent impart a distinct clustering to the later layer neurons. We hypothesize that ReLU networks also have similar feature learning behaviour.

dlgn, neural network, overlap kernel, (12 more...)

arXiv.org Artificial Intelligence

2404.04312

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.35)

Add feedback

Generalization Performance of Empirical Risk Minimization on Over-parameterized Deep ReLU Nets

Lin, Shao-Bo, Wang, Yao, Zhou, Ding-Xuan

arXiv.org Artificial IntelligenceFeb-28-2023

In this paper, we study the generalization performance of global minima for implementing empirical risk minimization (ERM) on over-parameterized deep ReLU nets. Using a novel deepening scheme for deep ReLU nets, we rigorously prove that there exist perfect global minima achieving almost optimal generalization error bounds for numerous types of data under mild conditions. Since over-parameterization is crucial to guarantee that the global minima of ERM on deep ReLU nets can be realized by the widely used stochastic gradient descent (SGD) algorithm, our results indeed fill a gap between optimization and generalization.

artificial intelligence, deep relu net, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2111.14039

Country:

North America > United States (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > China > Shaanxi Province > Xi'an (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Catapult Dynamics and Phase Transitions in Quadratic Nets

Meltzer, David, Liu, Junyu

arXiv.org Artificial IntelligenceJan-18-2023

Neural networks trained with gradient descent can undergo non-trivial phase transitions as a function of the learning rate. In (Lewkowycz et al., 2020) it was discovered that wide neural nets can exhibit a catapult phase for super-critical learning rates, where the training loss grows exponentially quickly at early times before rapidly decreasing to a small value. During this phase the top eigenvalue of the neural tangent kernel (NTK) also undergoes significant evolution. In this work, we will prove that the catapult phase exists in a large class of models, including quadratic models and two-layer, homogenous neural nets. To do this, we show that for a certain range of learning rates the weight norm decreases whenever the loss becomes large. We also empirically study learning rates beyond this theoretically derived range and show that the activation map of ReLU nets trained with super-critical learning rates becomes increasingly sparse as we increase the learning rate.

artificial intelligence, machine learning, quadratic model, (15 more...)

arXiv.org Artificial Intelligence

2301.07737

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York > Tompkins County > Ithaca (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

ReLU nets adapt to intrinsic dimensionality beyond the target domain

Cloninger, Alexander, Klock, Timo

arXiv.org Machine LearningAug-6-2020

We study the approximation of two-layer compositions $f(x) = g(\phi(x))$ via deep ReLU networks, where $\phi$ is a nonlinear, geometrically intuitive, and dimensionality reducing feature map. We focus on two complementary choices for $\phi$ that are intuitive and frequently appearing in the statistical literature. The resulting approximation rates are near optimal and show adaptivity to intrinsic notions of complexity, which significantly extend a series of recent works on approximating targets over low-dimensional manifolds. Specifically, we show that ReLU nets can express functions, which are invariant to the input up to an orthogonal projection onto a low-dimensional manifold, with the same efficiency as if the target domain would be the manifold itself. This implies approximation via ReLU nets is faithful to an intrinsic dimensionality governed by the target $f$ itself, rather than the dimensionality of the approximation domain. As an application of our approximation bounds, we study empirical risk minimization over a space of sparsely constrained ReLU nets under the assumption that the conditional expectation satisfies one of the proposed models. We show near-optimal estimation guarantees in regression and classifications problems, for which, to the best of our knowledge, no efficient estimator has been developed so far.

artificial intelligence, machine learning, relu net, (16 more...)

arXiv.org Machine Learning

2008.02545

Country:

North America > United States > California > San Diego County > San Diego (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Norway > Eastern Norway > Oslo (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.45)

Add feedback