Bridging the Gap between Constant Step Size Stochastic Gradient Descent and Markov Chains

Dieuleveut, Aymeric, Durmus, Alain, Bach, Francis

Jul-20-2017–arXiv.org Machine Learning

We consider the minimization of an objective function given access to unbiased estimates of the function gradients. This key methodological problem has raised interest in different communities: in large-scale machine learning (Bottou and Bousquet, 2008; Shalev-Shwartz et al., 2009, 2007), optimization (Nemirovski et al., 2009; Nesterov and Vial, 2008), and stochastic approximation (Kushner and Yin, 2003; Polyak and Juditsky, 1992; Ruppert, 1988). The most widely used algorithms are stochastic gradient descent(SGD), a.k.a. Robbins-Monro algorithm(Robbins and Monro, 1951), and some of its modifications based on averaging of the iterates (Polyak and Juditsky, 1992; Rakhlin et al., 2011; Shamir and Zhang, 2013). While the choice of the step-size may be done robustly in the deterministic case (see, e.g., Bertsekas, 1995), this remains a traditional theoretical and practical issue in the stochastic case. Indeed, early work suggested to use step-size decaying with the number k of iterations as O(1/k) (Robbins and Monro, 1951), but it appeared to be non-robust to ill-conditioning and slower decays such as O(1/ k) together with averaging lead to both good practical and theoretical performance (Bach, 2014). We consider in this paper constant step-size SGD, which is often used in practice. Although the algorithm is not converging in general to the global optimum of the objective function, constant step-sizes come with benefits: (a) there is single parameter value to set as opposed to the several choices of parameters to deal with decaying step-sizes, e.g., as 1/( k)

artificial intelligence, equation, machine learning, (17 more...)

arXiv.org Machine Learning

Jul-20-2017

arXiv.org PDF

Add feedback

Country:
- North America > United States (0.46)
- Europe (0.46)

Genre:
- Research Report (0.82)

Technology:
- Information Technology > Artificial Intelligence > Machine Learning
  - Statistical Learning > Gradient Descent (1.00)
  - Learning Graphical Models > Undirected Networks
    - Markov Models (0.41)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found