Implicit Bias of (Stochastic) Gradient Descent for Rank-1 Linear Neural Network

Jan-19-2025, 19:59:25 GMT–Neural Information Processing Systems

Studying the implicit bias of gradient descent (GD) and stochastic gradient descent (SGD) is critical to unveil the underlying mechanism of deep learning. Unfortunately, even for standard linear networks in regression setting, a comprehensive characterization of the implicit bias is still an open problem. This paper proposes to investigate a new proxy model of standard linear network, rank-1 linear network, where each weight matrix is parameterized as a rank-1 form. For over-parameterized regression problem, we precisely analyze the implicit bias of GD and SGD---by identifying a "potential" function such that GD converges to its minimizer constrained by zero training error (i.e., interpolation solution), and further characterizing the role of the noise introduced by SGD in perturbing the form of this potential. Our results explicitly connect the depth of the network and the initialization with the implicit bias of GD and SGD.

implicit bias, linear network, rank-1 linear neural network, (7 more...)

Neural Information Processing Systems

Jan-19-2025, 19:59:25 GMT

Conferences Web Page

Add feedback

Genre:
- Research Report > New Finding (0.41)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)