Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations

Mar-12-2024–arXiv.org Machine Learning

Neural networks have achieved remarkable success across various tasks, yet the precise mechanism driving this success remains theoretically elusive. The training of neural networks involve optimizing a non-convex loss function, where the training algorithm typically is a first-order methods such as gradient descent or its variants. A particularly puzzling aspect is how these training algorithms succeed in finding a solution with good generalization capabilities despite the non-convexity of the loss landscape. In addition to the choice of the training algorithm, the choice of initialization in these algorithms play a crucial role in determining the neural network performance. Indeed, recent works have made increasingly clear the benefit of small initializations, revealing that neural networks trained using (stochastic) gradient descent with small initializations exhibit feature learning [2] and also generalize better for various tasks [3, 4, 5]; see Section 2 for more details into the impact of initialization scale. However, for small initializations, the training dynamics of neural networks is extremely non-linear and not well understood so far. Our focus in this paper is on understanding the effect of small initialization on the training dynamics of neural networks. In pursuit of a deeper understanding of the training mechanism for small initializations, researchers have uncovered the phenomenon of directional convergence in the neural network weights during early phases of training [6, 7]. In [6], authors study the gradient flow dynamics of training two-layer Rectified Linear Unit (ReLU) neural networks.

artificial intelligence, machine learning, neural network, (13 more...)

arXiv.org Machine Learning

Mar-12-2024

arXiv.org PDF

Add feedback

Country:
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre:
- Research Report (1.00)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks (1.00)
  - Statistical Learning > Gradient Descent (0.74)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found