different learning rate
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Michigan (0.04)
Neuronal Fluctuations: Learning Rates vs Participating Neurons
Pareek, Darsh, Kumar, Umesh, Rao, Ruthu, Janjam, Ravi
Deep Neural Networks (DNNs) rely on inherent fluctuations in their internal parameters (weights and biases) to effectively navigate the complex optimization landscape and achieve robust performance. While these fluctuations are recognized as crucial for escaping local minima and improving generalization, their precise relationship with fundamental hyperparameters remains underexplored. A significant knowledge gap exists concerning how the learning rate, a critical parameter governing the training process, directly influences the dynamics of these neural fluctuations. This study systematically investigates the impact of varying learning rates on the magnitude and character of weight and bias fluctuations within a neural network. We trained a model using distinct learning rates and analyzed the corresponding parameter fluctuations in conjunction with the network's final accuracy. Our findings aim to establish a clear link between the learning rate's value, the resulting fluctuation patterns, and overall model performance. By doing so, we provide deeper insights into the optimization process, shedding light on how the learning rate mediates the crucial exploration-exploitation trade-off during training. This work contributes to a more nuanced understanding of hyperparameter tuning and the underlying mechanics of deep learning.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > Middle East > Israel > Haifa District > Haifa (0.04)
- North America > United States > Texas > Travis County > Austin (0.04)
- North America > United States > Michigan (0.04)
Understanding the Evolution of the Neural Tangent Kernel at the Edge of Stability
Jiang, Kaiqi, Cohen, Jeremy, Li, Yuanzhi
The study of Neural Tangent Kernels (NTKs) in deep learning has drawn increasing attention in recent years. NTKs typically actively change during training and are related to feature learning. In parallel, recent work on Gradient Descent (GD) has found a phenomenon called Edge of Stability (EoS), in which the largest eigenvalue of the NTK oscillates around a value inversely proportional to the step size. However, although follow-up works have explored the underlying mechanism of such eigenvalue behavior in depth, the understanding of the behavior of the NTK eigenvectors during EoS is still missing. This paper examines the dynamics of NTK eigenvectors during EoS in detail. Across different architectures, we observe that larger learning rates cause the leading eigenvectors of the final NTK, as well as the full NTK matrix, to have greater alignment with the training target. We then study the underlying mechanism of this phenomenon and provide a theoretical analysis for a two-layer linear network. Our study enhances the understanding of GD training dynamics in deep learning.
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- Europe > Slovakia > Bratislava > Bratislava (0.04)
- Europe > Austria (0.04)