AITopics | two-layer relu network

Collaborating Authors

two-layer relu network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mildly Overparameterized ReLU Networks on Orthogonal Data: Incremental Learning and Implicit Bias

Town, James, Boursier, Etienne, Lewis, Ben, Englert, Matthias, Lazic, Ranko

arXiv.org Machine LearningMay-27-2026

The successful training of neural networks hinges on the use of first order optimization methods, yet the theoretical characterization of these methods remains incomplete. This is especially true in settings with mild overparameterization. In this work, we study the gradient flow dynamics of two-layer ReLU networks from small initialization with orthogonal training data. We prove the limiting flow converges to a saddle-to-saddle jump process as the initialization scale tends to zero, revealing an incremental learning phenomenon in which a new neuron activates at each saddle. This analysis recovers the known result of Dana et al. (2025, arXiv:2502.16977) that the network interpolates the training data with high probability as soon as $m \gtrsim \log(n)$, where $m$ is the network width and $n$ is the number of training samples. This incremental process characterization also allows us to derive a novel implicit bias result: the learned interpolator has a squared $\ell_2$-norm scaling as $\sqrt{n}$, which is within a constant factor of the minimal $\ell_2$-norm interpolator. More broadly, our work provides the first rigorous proof of an incremental learning process for ReLU networks, whilst suggesting mildly overparameterized networks can converge to interpolating solutions whose complexity is of the same order as that of the optimal interpolator.

artificial intelligence, machine learning, neuron, (19 more...)

arXiv.org Machine Learning

2605.27097

Country: North America (0.15)

Genre: Research Report (0.63)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

Neural Information Processing SystemsApr-25-2026, 13:48:16 GMT

In this work, we study the implications of the implicit bias of gradient flow on generalization and adversarial robustness in ReLU networks. We focus on a setting where the data consists of clusters and the correlations between cluster means are small, and show that in two-layer ReLU networks gradient flow is biased towards solutions that generalize well, but are vulnerable to adversarial examples. Our results hold even in cases where the network is highly overparameterized. Despite the potential for harmful overfitting in such settings, we prove that the implicit bias of gradient flow prevents it. However, the implicit bias also leads to non-robust solutions (susceptible to small adversarial ℓ2-perturbations), even though robust networks that fit the data exist.

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)

Add feedback

Invariance . the Initialized

Neural Information Processing SystemsApr-24-2026, 22:46:25 GMT

In this paper, we analyze neural networks trained on high-dimensional data that lies on a low dimen-441 sional linear subspace denoted by P. We assume that the dimension of P is d ℓ. Throughout the pa-442 per it will be more convenient to analyze data which lies on the subspace M = span({e1,...,ed ℓ}),443 because then the "off manifold" directions correspond exactly to certain coordinates of the input. In444 this section we show that we can essentially analyze the data as if it is rotated to lie on M, and it445 would imply the same consequences as the original data from P.446 Theorem A.1. Let P Rd be a subspace of dimension d ℓ, and let M = span{e1,...,ed ℓ}.447 Let R be an orthogonal matrix such that R P = M, let X P be a training dataset and let448 XR = {R x: x X}.

artificial intelligence, experiment, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

0ffd11b5bce666816802b86c77b54cf7-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-18-2026, 23:37:04 GMT

artificial intelligence, initialization, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

An Improved Analysis of Training Over-parameterized Deep Neural Networks

Difan Zou, Quanquan Gu

Neural Information Processing SystemsFeb-12-2026, 11:47:24 GMT

Arecent lineofresearch hasshownthatgradient-based algorithms withrandom initialization can converge to the global minima of the training loss for overparameterized (i.e.,sufficiently wide)deepneuralnetworks. However,thecondition onthewidth oftheneural networktoensure theglobal convergence isvery stringent, which is often a high-degree polynomial in the training sample size n (e.g., O(n24)).

artificial intelligence, arxivpreprintarxiv, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

Neural Information Processing SystemsFeb-8-2026, 14:04:50 GMT

In a seminal paper, Szegedy et al. [Sze+14] observed that deep networks are extremely vulnerable to adversarial examples, namely, very small perturbations to the inputs

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

Add feedback

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

Neural Information Processing SystemsFeb-8-2026, 14:04:46 GMT

In a seminal paper, Szegedy et al. [Sze+14] observed that deep networks are extremely vulnerable to adversarial examples, namely, very small perturbations to the inputs

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
Asia (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Annihilation of Spurious Minima in Two-Layer ReLU Networks

Neural Information Processing SystemsDec-25-2025, 17:25:51 GMT

We study the optimization problem associated with fitting two-layer ReLU neural networks with respect to the squared loss, where labels are generated by a target network. Use is made of the rich symmetry structure to develop a novel set of tools for studying the mechanism by which over-parameterization annihilates spurious minima through. Sharp analytic estimates are obtained for the loss and the Hessian spectrum at different minima, and it is shown that adding neurons can turn symmetric spurious minima into saddles through a local mechanism that does not generate new spurious minima; minima of smaller symmetry require more neurons. Using Cauchy's interlacing theorem, we prove the existence of descent directions in certain subspaces arising from the symmetry structure of the loss function. This analytic approach uses techniques, new to the field, from algebraic geometry, representation theory and symmetry breaking, and confirms rigorously the effectiveness of over-parameterization in making the associated loss landscape accessible to gradient-based methods. For a fixed number of neurons and inputs, the spectral results remain true under symmetry breaking perturbation of the target.

annihilation, name change, spurious minima, (5 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.78)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.60)

Add feedback

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces

Neural Information Processing SystemsDec-23-2025, 22:32:39 GMT

Despite a great deal of research, it is still not well-understood why trained neural networks are highly vulnerable to adversarial examples.In this work we focus on two-layer neural networks trained using data which lie on a low dimensional linear subspace.We show that standard gradient methods lead to non-robust neural networks, namely, networks which have large gradients in directions orthogonal to the data subspace, and are susceptible to small adversarial $L_2$-perturbations in these directions.Moreover, we show that decreasing the initialization scale of the training algorithm, or adding $L_2$ regularization, can make the trained network more robust to adversarial perturbations orthogonal to the data.

adversarial example exist, low dimensional linear subspace, two-layer relu network, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Filters

Collaborating Authors

two-layer relu network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Mildly Overparameterized ReLU Networks on Orthogonal Data: Incremental Learning and Implicit Bias

1c26c389d60ec419fd24b5fee5b35796-Supplemental-Conference.pdf

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

Invariance . the Initialized

0ffd11b5bce666816802b86c77b54cf7-Supplemental-Conference.pdf

An Improved Analysis of Training Over-parameterized Deep Neural Networks

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

The Double-Edged Sword of Implicit Bias: Generalization vs. Robustness in ReLU Networks

Annihilation of Spurious Minima in Two-Layer ReLU Networks

Adversarial Examples Exist in Two-Layer ReLU Networks for Low Dimensional Linear Subspaces