The Sample Complexity of Gradient Descent in Stochastic Convex Optimization

Apr-10-2024–arXiv.org Artificial Intelligence

Stochastic Convex Optimization (SCO) is a theoretical model that depicts a learner that minimizes a (Lipschitz) convex function, given finite noisy observations of the objective [22]. While often considered simplistic, in recent years SCO has become a focus of theoretical research, partly, because of its importance to the study of first-order optimization methods. But, also, it has become focus of study because it is one of few theoretical settings that exhibit overparameterized learning. In more detail, classical learning theory often focuses on the tension between number of samples, or training data, and the complexity of the model to be learnt. A common wisdom of classical theories [1, 7, 14, 24] is that, to avoid overfitting, the complexity of a model should be adjusted in proportion to the amount of training data. However, recent advances in Machine Learning have challenged this viewpoint. Evidently [18, 25], state-of-the-art algorithms generalize well but without, explicitly, controlling the capacity of the model to be learnt. In turn, today, it is one of the most emerging challenges, for learning theory, to understand learnability when the number of parameters in a learnt model exceeds the number of examples, and when, seemingly, nothing withholds the algorithm from overfitting. Towards this, we look at SCO.

algorithm, oracle, sample complexity, (13 more...)

arXiv.org Artificial Intelligence

Apr-10-2024

arXiv.org PDF

Add feedback

Country:
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Computational Learning Theory (0.68)
  - Statistical Learning > Gradient Descent (0.51)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found