AITopics | three-layer network

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Neural Information Processing SystemsApr-25-2026, 21:59:37 GMT

One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the modern deep learning paradigm of pretraining and finetuneing. However, this feature learning process remains poorly understood from a theoretical perspective, with existing analyses largely restricted to two-layer networks. In this work we show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. We analyze the features learned by a three-layer network trained with layer-wise gradient descent, and present a general purpose theorem which upper bounds the sample complexity and width needed to achieve low test error when the target has specific hierarchical structure. We instantiate our framework in specific statistical learning settings - single-index models and functions of quadratic features - and show that in the latter setting three-layer networks obtain a sample complexity improvement over all existing guarantees for two-layer networks. Crucially, this sample complexity improvement relies on the ability of three-layer networks to efficiently learn nonlinear features. We then establish a concrete optimization-based depth separation by constructing a function which is efficiently learnable via gradient descent on a three-layer network, yet cannot be learned efficiently by a two-layer network. Our work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.

artificial intelligence, machine learning, xtax, (16 more...)

Neural Information Processing Systems

Country: North America > United States > Minnesota (0.27)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Zeyuan Allen-Zhu, Yuanzhi Li, Yingyu Liang

Neural Information Processing SystemsFeb-12-2026, 09:22:12 GMT

The fundamental learning theory behind neural networks remains largely open. What classes of functions can neural networks actually learn?

artificial intelligence, machine learning, neural network, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

236b6a814a1d2c0ff504ca7bf380f7ff-Paper-Conference.pdf

Neural Information Processing SystemsFeb-8-2026, 23:14:21 GMT

high probability, neural information processing system, neural network, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (0.45)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Neural Information Processing SystemsDec-24-2025, 05:31:39 GMT

One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the modern deep learning paradigm of pretraining and finetuneing. However, this feature learning process remains poorly understood from a theoretical perspective, with existing analyses largely restricted to two-layer networks. In this work we show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. We analyze the features learned by a three-layer network trained with layer-wise gradient descent, and present a general purpose theorem which upper bounds the sample complexity and width needed to achieve low test error when the target has specific hierarchical structure. We instantiate our framework in specific statistical learning settings -- single-index models and functions of quadratic features -- and show that in the latter setting three-layer networks obtain a sample complexity improvement over all existing guarantees for two-layer networks. Crucially, this sample complexity improvement relies on the ability of three-layer networks to efficiently learn features. We then establish a concrete optimization-based depth separation by constructing a function which is efficiently learnable via gradient descent on a three-layer network, yet cannot be learned efficiently by a two-layer network. Our work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.

nonlinear feature learning, three-layer network, two-layer network, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Zeyuan Allen-Zhu, Yuanzhi Li, Yingyu Liang

Neural Information Processing SystemsOct-2-2025, 20:57:33 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, neural network, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Wisconsin (0.14)

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Neural Information Processing SystemsMay-26-2025, 17:43:55 GMT

One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the modern deep learning paradigm of pretraining and finetuneing. However, this feature learning process remains poorly understood from a theoretical perspective, with existing analyses largely restricted to two-layer networks. In this work we show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. We analyze the features learned by a three-layer network trained with layer-wise gradient descent, and present a general purpose theorem which upper bounds the sample complexity and width needed to achieve low test error when the target has specific hierarchical structure.

artificial intelligence, machine learning, two-layer network, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Neural Information Processing SystemsOct-10-2024, 11:23:38 GMT

One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the modern deep learning paradigm of pretraining and finetuneing. However, this feature learning process remains poorly understood from a theoretical perspective, with existing analyses largely restricted to two-layer networks. In this work we show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. We analyze the features learned by a three-layer network trained with layer-wise gradient descent, and present a general purpose theorem which upper bounds the sample complexity and width needed to achieve low test error when the target has specific hierarchical structure.

nonlinear feature learning, three-layer network, two-layer network, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)

Add feedback

Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

Gai, Kuo, Zhang, Shihua

arXiv.org Machine LearningJul-1-2024

In practice, deeper networks tend to be more powerful than shallow ones, but this has not been understood theoretically. In this paper, we find a analytical solution of a three-layer network with a matrix exponential activation function, i.e., f(X) = W Our proof shows the power of depth and the use of a non-linear activation function, since one layer network can only solve one equation,i.e.,Y = W X. Deep neural networks have become successful in many fields, including computer vision, natural language processing, bioinformatics, etc. However, the mathematical principle of deep learning is still not fully understood, especially why deeper networks with non-linear activation functions tend to be more powerful than shallower ones. It is well known that sufficient large depth-2 neural networks with reasonable activation functions can approximate any continuous function on a bounded domain (Cybenko, 1989; Funahashi, 1989; Hornik et al., 1989; Barron, 1994; Pinkus, 1999), but this requires the width of networks to be exponential. Recent authors have shown that some functions can be approximated by deeper networks with fewer neurons than by shallower ones, such as radial functions (Eldan & Shamir, 2016), Boolean circuit (Rossman et al., 2015) or functions induced by neural network (Telgarsky, 2016).

activation function, exp, neural network, (10 more...)

arXiv.org Machine Learning

2407.0254

Country:

Asia > Middle East > Jordan (0.05)
Asia > China > Beijing > Beijing (0.04)
Asia > Vietnam > Long An Province > Tân An (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Nichani, Eshaan, Damian, Alex, Lee, Jason D.

arXiv.org Machine LearningOct-31-2023

One of the central questions in the theory of deep learning is to understand how neural networks learn hierarchical features. The ability of deep networks to extract salient features is crucial to both their outstanding generalization ability and the modern deep learning paradigm of pretraining and finetuneing. However, this feature learning process remains poorly understood from a theoretical perspective, with existing analyses largely restricted to two-layer networks. In this work we show that three-layer neural networks have provably richer feature learning capabilities than two-layer networks. We analyze the features learned by a three-layer network trained with layer-wise gradient descent, and present a general purpose theorem which upper bounds the sample complexity and width needed to achieve low test error when the target has specific hierarchical structure. We instantiate our framework in specific statistical learning settings -- single-index models and functions of quadratic features -- and show that in the latter setting three-layer networks obtain a sample complexity improvement over all existing guarantees for two-layer networks. Crucially, this sample complexity improvement relies on the ability of three-layer networks to efficiently learn nonlinear features. We then establish a concrete optimization-based depth separation by constructing a function which is efficiently learnable via gradient descent on a three-layer network, yet cannot be learned efficiently by a two-layer network. Our work makes progress towards understanding the provable benefit of three-layer neural networks over two-layer networks in the feature learning regime.

artificial intelligence, machine learning, neural network, (16 more...)

arXiv.org Machine Learning

2305.06986

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Middle East > Tunisia > Ben Arous Governorate > Ben Arous (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Universal Solutions of Feedforward ReLU Networks for Interpolations

Huang, Changcun

arXiv.org Artificial IntelligenceDec-11-2022

This paper provides a theoretical framework on the solution of feedforward ReLU networks for interpolations, in terms of what is called an interpolation matrix, which is the summary, extension and generalization of our three preceding works, with the expectation that the solution of engineering could be included in this framework and finally understood. To three-layer networks, we classify different kinds of solutions and model them in a normalized form; the solution finding is investigated by three dimensions, including data, networks and the training; the mechanism of a type of overparameterization solution is interpreted. To deep-layer networks, we present a general result called sparse-matrix principle, which could describe some basic behavior of deep layers and explain the phenomenon of the sparse-activation mode that appears in engineering applications associated with brain science; an advantage of deep layers compared to shallower ones is manifested in this principle. As applications, a general solution of deep neural networks for classifications is constructed by that principle; and we also use the principle to study the data-disentangling property of encoders. Analogous to the three-layer case, the solution of deep layers is also explored through several dimensions. The mechanism of multi-output neural networks is explained from the perspective of interpolation matrices.

artificial intelligence, equation 7, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2208.07498

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

three-layer network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

236b6a814a1d2c0ff504ca7bf380f7ff-Paper-Conference.pdf

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Analytical Solution of a Three-layer Network with a Matrix Exponential Activation Function

Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks

Universal Solutions of Feedforward ReLU Networks for Interpolations