AITopics | regularization matter

Neural Information Processing Systems http://nips.cc/

artificial intelligence, generalization and optimization, machine learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.40)

Add feedback

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Neural Information Processing SystemsMay-27-2025, 13:02:39 GMT

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global convergence results but does not work when there is a standard \ell_2 regularizer, which is useful to have in practice. We show that sample efficiency can indeed depend on the presence of the regularizer: we construct a simple distribution in d dimensions which the optimal regularized neural net learns with O(d) samples but the NTK requires \Omega(d 2) samples to learn. To prove this, we establish two analysis tools: i) for multi-layer feedforward ReLU nets, we show that the global minimizer of a weakly-regularized cross-entropy loss is the max normalized margin solution among all neural nets, which generalizes well; ii) we develop a new technique for proving lower bounds for kernel methods, which relies on showing that the kernel cannot focus on informative features. Motivated by our generalization results, we study whether the regularized global optimum is attainable.

generalization and optimization, induced kernel, regularization matter, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Neural Information Processing SystemsJan-25-2025, 09:02:16 GMT

Summary: The paper studies the generalization and optimization aspects of regularized neural networks, and provide two key contributions: (a)they show that a O(d) sample complexity gap between global minima of regularized loss and the induced kernel method. They also establish that in infinite-width two-layer nets, a variant of gradient descent converges to global minimum with of (weakly) regularized cross entropy loss in poly iterations. The paper studies a natural and important problem and makes fundamental contributions in this direction. Recent results in deep learning theory exploits this neural tangent connection to prove optimization and generalization results. In light of this, it is important to study the limitations of this.

contribution, generalization and optimization, regularization matter, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Reviews: Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Neural Information Processing SystemsJan-25-2025, 09:02:06 GMT

This paper investigates how the regularization helps for training neural networks in contrast to the unregularized neural tangent kernel method. It is shown that regularization captures "informative signal" but the NTK model does not, which highlights the effectiveness of the regularization. Moreover, this paper shows polynomial time convergence of gradient flow corresponding to the infinite width neural network. The contribution is novel and the implication is quite instructive to neural tangent kernel learning. Especially, the lower bound evaluation for kernel learning is a novel contribution.

generalization and optimization, induced kernel, regularization matter, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Neural Information Processing SystemsOct-10-2024, 10:55:54 GMT

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global convergence results but does not work when there is a standard \ell_2 regularizer, which is useful to have in practice. We show that sample efficiency can indeed depend on the presence of the regularizer: we construct a simple distribution in d dimensions which the optimal regularized neural net learns with O(d) samples but the NTK requires \Omega(d 2) samples to learn. To prove this, we establish two analysis tools: i) for multi-layer feedforward ReLU nets, we show that the global minimizer of a weakly-regularized cross-entropy loss is the max normalized margin solution among all neural nets, which generalizes well; ii) we develop a new technique for proving lower bounds for kernel methods, which relies on showing that the kernel cannot focus on informative features. Motivated by our generalization results, we study whether the regularized global optimum is attainable.

generalization and optimization, induced kernel, regularization matter, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Wei, Colin, Lee, Jason D., Liu, Qiang, Ma, Tengyu

Neural Information Processing SystemsMar-19-2020, 00:31:53 GMT

Recent works have shown that on sufficiently over-parametrized neural nets, gradient descent with relatively large initialization optimizes a prediction function in the RKHS of the Neural Tangent Kernel (NTK). This analysis leads to global convergence results but does not work when there is a standard $\ell_2$ regularizer, which is useful to have in practice. We show that sample efficiency can indeed depend on the presence of the regularizer: we construct a simple distribution in $d$ dimensions which the optimal regularized neural net learns with $O(d)$ samples but the NTK requires $\Omega(d 2)$ samples to learn. To prove this, we establish two analysis tools: i) for multi-layer feedforward ReLU nets, we show that the global minimizer of a weakly-regularized cross-entropy loss is the max normalized margin solution among all neural nets, which generalizes well; ii) we develop a new technique for proving lower bounds for kernel methods, which relies on showing that the kernel cannot focus on informative features. Motivated by our generalization results, we study whether the regularized global optimum is attainable.

generalization and optimization, induced kernel, regularization matter, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Collaborating Authors

regularization matter

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Reviews: Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Reviews: Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel

Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel