On the Impacts of the Random Initialization in the Neural Tangent Kernel Theory

May-26-2025, 22:26:13 GMT–Neural Information Processing Systems

This paper aims to discuss the impact of random initialization of neural networks in the neural tangent kernel (NTK) theory, which is ignored by most recent works in the NTK theory. It is well known that as the network's width tends to infinity, the neural network with random initialization converges to a Gaussian process (f {\mathrm{GP}}), which takes values in (L {2}(\mathcal{X})), where (\mathcal{X}) is the domain of the data. In contrast, to adopt the traditional theory of kernel regression, most recent works introduced a special mirrored architecture and a mirrored (random) initialization to ensure the network's output is identically zero at initialization. Therefore, it remains a question whether the conventional setting and mirrored initialization would make wide neural networks exhibit different generalization capabilities. Consequently, the generalization error of the wide neural network trained by gradient descent is (\Omega(n {-\frac{3}{d 3}})), and it still suffers from the curse of dimensionality. Thus, the NTK theory may not explain the superior performance of neural networks.

artificial intelligence, machine learning, mathrm, (10 more...)

Neural Information Processing Systems

May-26-2025, 22:26:13 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)