Implicit Bias of Gradient Descent on Linear Convolutional Networks

Suriya Gunasekar, Jason D. Lee, Daniel Soudry, Nati Srebro

May-23-2025, 18:52:36 GMT–Neural Information Processing Systems

Large scale neural networks used in practice are highly over-parameterized with far more trainable model parameters compared to the number of training examples. Consequently, optimization objectives for learning such high capacity models have many global minima that fit training data perfectly. However, minimizing the training loss using specific optimization algorithms take us to not just any global minima, but some special global minima, e.g., global minima minimizing some regularizer R(β). In over-parameterized models, specially deep neural networks, much, if not most, of the inductive bias of the learned model comes from this implicit regularization from the optimization algorithm. Understanding the implicit bias, e.g., via characterizing R(β), is thus essential for understanding how and what the model learns.

artificial intelligence, gradient descent, machine learning, (16 more...)

Neural Information Processing Systems

May-23-2025, 18:52:36 GMT

Conferences PDF

Add feedback

Country:
- Asia > Middle East (0.14)
- North America
  - Canada (0.14)
  - United States (0.14)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Neural Networks > Deep Learning (0.35)
  - Statistical Learning > Gradient Descent (0.48)