Ridge Rider: Finding Diverse Solutions by Following Eigenvectors of the Hessian

Oct-9-2024, 11:07:17 GMT–Neural Information Processing Systems

Over the last decade, a single algorithm has changed many facets of our lives - Stochastic Gradient Descent (SGD). In the era of ever decreasing loss functions, SGD and its various offspring have become the go-to optimization tool in machine learning and are a key component of the success of deep neural networks (DNNs). While SGD is guaranteed to converge to a local optimum (under loose assumptions), in some cases it may matter which local optimum is found, and this is often context-dependent. Examples frequently arise in machine learning, from shape-versus-texture-features to ensemble methods and zero-shot coordination. In these settings, there are desired solutions which SGD on standard' loss functions will not find, since it instead converges to theeasy' solutions.

diverse solution, hessian, ridge rider, (5 more...)

Neural Information Processing Systems

Oct-9-2024, 11:07:17 GMT

Conferences Web Page

Add feedback

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (0.62)
  - Neural Networks (0.62)